Skip to main content

Kiln AI

Project description

Kiln AI Core Library

Kiln AI Logo

PyPI - Version PyPI - Python Version Docs


Installation

pip install kiln_ai

About

This package is the Kiln AI core library. There is also a separate desktop application and server package. Learn more about Kiln AI at kiln.tech and on Github: github.com/Kiln-AI/kiln.

Guide: Using the Kiln Python Library

In this guide we'll walk common examples of how to use the library.

Documentation

The library has a comprehensive set of docs.

Table of Contents

Installation

pip install kiln-ai

Connecting AI Providers (OpenAI, OpenRouter, Ollama, etc)

The easiest way to connect AI providers is to use the Kiln app UI. Once connected in the UI, credentials will be stored to ~/.kiln_ai/settings.yml, which will be available to the library.

For configuring credentials from code or connecting custom servers/model, see Adding Custom Model or AI Provider from Code.

Using the Kiln Data Model

Understanding the Kiln Data Model

Kiln projects are simply a directory of files (mostly JSON files with the extension .kiln) that describe your project, including tasks, runs, and other data.

This dataset design was chosen for several reasons:

  • Git compatibility: Kiln project folders are easy to collaborate on in git. The filenames use unique IDs to avoid conflicts and allow many people to work in parallel. The files are small and easy to compare using standard diff tools.
  • JSON allows you to easily load and manipulate the data using standard tools (pandas, polars, etc)

The Kiln Python library provides a set of Python classes that which help you easily interact with your Kiln dataset. Using the library to load and manipulate your dataset is the fastest way to get started, and will guarantees you don't insert any invalid data into your dataset. There's extensive validation when using the library, so we recommend using it to load and manipulate your dataset over direct JSON manipulation.

Datamodel Overview

Here's a high level overview of the Kiln datamodel. A project folder will reflect this nested structure:

  • Project: a Kiln Project that organizes related tasks
    • Task: a specific task including prompt instructions, input/output schemas, and requirements
      • TaskRun: a sample (run) of a task including input, output and human rating information
      • Finetune: configuration and status tracking for fine-tuning models on task data
      • Prompt: a prompt for this task
      • DatasetSplit: a frozen collection of task runs divided into train/test/validation splits

Load a Project

Assuming you've created a project in the Kiln UI, you'll have a project.kiln file in your ~/Kiln Projects/Project Name directory.

from kiln_ai.datamodel import Project

project = Project.load_from_file("path/to/your/project.kiln")
print("Project: ", project.name, " - ", project.description)

# List all tasks in the project, and their dataset sizes
tasks = project.tasks()
for task in tasks:
    print("Task: ", task.name, " - ", task.description)
    print("Total dataset size:", len(task.runs()))

Load an Existing Dataset into a Kiln Task Dataset

If you already have a dataset in a file, you can load it into a Kiln project.

Important: Kiln will validate the input and output schemas, and ensure that each datapoint in the dataset is valid for this task.

  • Plaintext input/output: ensure "output_json_schema" and "input_json_schema" not set in your Task definition.
  • JSON input/output: ensure "output_json_schema" and "input_json_schema" are valid JSON schemas in your Task definition. Every datapoint in the dataset must be valid JSON fitting the schema.

Here's a simple example of how to load a dataset into a Kiln task:

import kiln_ai
import kiln_ai.datamodel

# Created a project and task via the UI and put its path here
task_path = "/Users/youruser/Kiln Projects/test project/tasks/632780983478 - Joke Generator/task.kiln"
task = kiln_ai.datamodel.Task.load_from_file(task_path)

# Add data to the task - loop over you dataset and run this for each item
item = kiln_ai.datamodel.TaskRun(
    parent=task,
    input='{"topic": "AI"}',
    output=kiln_ai.datamodel.TaskOutput(
        output='{"setup": "What is AI?", "punchline": "content_here"}',
    ),
)
item.save_to_file()
print("Saved item to file: ", item.path)

And here's a more complex example of how to load a dataset into a Kiln task. This example sets the source of the data (human in this case, but you can also set it be be synthetic), the created_by property, and a 5-star rating.

import kiln_ai
import kiln_ai.datamodel

# Created a project and task via the UI and put its path here
task_path = "/Users/youruser/Kiln Projects/test project/tasks/632780983478 - Joke Generator/task.kiln"
task = kiln_ai.datamodel.Task.load_from_file(task_path)

# Add data to the task - loop over you dataset and run this for each item
item = kiln_ai.datamodel.TaskRun(
    parent=task,
    input='{"topic": "AI"}',
    input_source=kiln_ai.datamodel.DataSource(
        type=kiln_ai.datamodel.DataSourceType.human,
        properties={"created_by": "John Doe"},
    ),
    output=kiln_ai.datamodel.TaskOutput(
        output='{"setup": "What is AI?", "punchline": "content_here"}',
        source=kiln_ai.datamodel.DataSource(
            type=kiln_ai.datamodel.DataSourceType.human,
            properties={"created_by": "Jane Doe"},
        ),
        rating=kiln_ai.datamodel.TaskOutputRating(
            value=5,
            type=kiln_ai.datamodel.datamodel_enums.five_star,
        ),
    ),
)
item.save_to_file()
print("Saved item to file: ", item.path)

Using your Kiln Dataset in a Notebook or Project

You can use your Kiln dataset in a notebook or project by loading the dataset into a pandas dataframe.

import kiln_ai
import kiln_ai.datamodel

# Created a project and task via the UI and put its path here
task_path = "/Users/youruser/Kiln Projects/test project/tasks/632780983478 - Joke Generator/task.kiln"
task = kiln_ai.datamodel.Task.load_from_file(task_path)

runs = task.runs()
for run in runs:
    print(f"Input: {run.input}")
    print(f"Output: {run.output.output}")

print(f"Total runs: {len(runs)}")

Using Kiln Dataset in Pandas

You can also use your Kiln dataset in a pandas dataframe, or a similar script for other tools like polars.

import glob
import json
import pandas as pd
from pathlib import Path

task_dir = "/Users/youruser/Kiln Projects/test project/tasks/632780983478 - Joke Generator"
dataitem_glob = task_dir + "/runs/*/task_run.kiln"

dfs = []
for file in glob.glob(dataitem_glob):
    js = json.loads(Path(file).read_text())

    df = pd.DataFrame([{
        "input": js["input"],
        "output": js["output"]["output"],
    }])

    # Alternatively: you can use pd.json_normalize(js) to get the full json structure
    # df = pd.json_normalize(js)
    dfs.append(df)
final_df = pd.concat(dfs, ignore_index=True)
print(final_df)

Building and Running a Kiln Task from Code

# Step 1: Create or Load a Task -- choose one of the following 1.A or 1.B

# Step 1.A: Optionally load an existing task from disk
# task = datamodel.Task.load_from_file("path/to/task.kiln")

# Step 1.B: Create a new task in code, without saving to disk.
task = datamodel.Task(
    name="test task",
    instruction="Tell a joke, given a subject.",
)
# replace with a valid JSON schema https://json-schema.org for your task (json string, not a python dict).
# Or delete this line to use plaintext output
task.output_json_schema = json_joke_schema

# Step 2: Create an Adapter to run the task, with a specific model and provider
adapter = adapter_for_task(task, model_name="llama_3_1_8b", provider="groq")

# Step 3: Invoke the Adapter to run the task
task_input = "cows"
response = await adapter.invoke(task_input)
print(f"Output: {response.output.output}")

# Step 4 (optional): Load the task from disk and print the results.
#  This will only work if the task was loaded from disk, or you called task.save_to_file() before invoking the adapter (ephemeral tasks don't save their result to disk)
task = datamodel.Task.load_from_file(tmp_path / "test_task.kiln")
for run in task.runs():
    print(f"Run: {run.id}")
    print(f"Input: {run.input}")
    print(f"Output: {run.output}")

Tagging Task Runs Programmatically

You can also tag your Kiln Task runs programmatically:

# Load your Kiln Task from disk
task_path = "/Users/youruser/Kiln Projects/test project/tasks/632780983478 - Joke Generator/task.kiln"
task = kiln_ai.datamodel.Task.load_from_file(task_path)

for run in task.runs():
    # Parse the task output from JSON
    output = json.loads(run.output.output)

    # Add a tag if the punchline is unusually short
    if len(output["punchline"]) < 100:
        run.tags.append("very_short")
        run.save_to_file()  # Persist the updated tags

Adding Custom Model or AI Provider from Code

You can add additional AI models and providers to Kiln.

See our docs for more information, including how to add these from the UI:

You can also add these from code. The kiln_ai.utils.Config class helps you manage the Kiln config file (stored at ~/.kiln_settings/config.yaml):

# Addding an OpenAI compatible provider
name = "CustomOllama"
base_url = "http://localhost:1234/api/v1"
api_key = "12345"
providers = Config.shared().openai_compatible_providers or []
existing_provider = next((p for p in providers if p["name"] == name), None)
if existing_provider:
    # skip since this already exists
    return
providers.append(
    {
        "name": name,
        "base_url": base_url,
        "api_key": api_key,
    }
)
Config.shared().openai_compatible_providers = providers
# Add a custom model ID to an existing provider.
new_model = "openai::gpt-3.5-turbo"
custom_model_ids = Config.shared().custom_models
existing_model = next((m for m in custom_model_ids if m == new_model), None)
if existing_model:
    # skip since this already exists
    return
custom_model_ids.append(new_model)
Config.shared().custom_models = custom_model_ids

Taking Kiln RAG to production

When you're ready to deploy your RAG system, you can export your processed documents to any vector store supported by LlamaIndex. This allows you to use your Kiln-configured chunking and embedding settings in production.

Load a LlamaIndex Vector Store

Kiln provides a VectorStoreLoader that yields your processed document chunks as LlamaIndex TextNode objects. These nodes contain the same metadata, chunking and embedding data as your Kiln Search Tool configuration.

from kiln_ai.datamodel import Project
from kiln_ai.datamodel.rag import RagConfig
from kiln_ai.adapters.vector_store_loaders import VectorStoreLoader

# Load your project and RAG configuration
project = Project.load_from_file("path/to/your/project.kiln")
rag_config = RagConfig.from_id_and_parent_path("rag-config-id", project.path)

# Create the loader
loader = VectorStoreLoader(project=project, rag_config=rag_config)

# Export chunks to any LlamaIndex vector store
async for batch in loader.iter_llama_index_nodes(batch_size=10):
    # Insert into your chosen vector store
    # Examples: LanceDB, Pinecone, Chroma, Qdrant, etc.
    pass

Supported Vector Stores: LlamaIndex supports 20+ vector stores including LanceDB, Pinecone, Weaviate, Chroma, Qdrant, and more. See the full list.

Example: LanceDB Cloud

Internally Kiln uses LanceDB. By using LanceDB cloud you'll get the same indexing behaviour as in app.

Here's a complete example using LanceDB Cloud:

from kiln_ai.datamodel import Project
from kiln_ai.datamodel.rag import RagConfig
from kiln_ai.datamodel.vector_store import VectorStoreConfig
from kiln_ai.adapters.vector_store_loaders import VectorStoreLoader
from kiln_ai.adapters.vector_store.lancedb_adapter import lancedb_construct_from_config

# Load configurations
project = Project.load_from_file("path/to/your/project.kiln")
rag_config = RagConfig.from_id_and_parent_path("rag-config-id", project.path)
vector_store_config = VectorStoreConfig.from_id_and_parent_path(
    rag_config.vector_store_config_id, project.path,
)

# Create LanceDB vector store
lancedb_store = lancedb_construct_from_config(
    vector_store_config=vector_store_config,
    uri="db://my-project",
    api_key="sk_...",
    region="us-east-1",
    table_name="my-documents",  # Created automatically
)

# Export and insert your documents
loader = VectorStoreLoader(project=project, rag_config=rag_config)
async for batch in loader.iter_llama_index_nodes(batch_size=100):
    await lancedb_store.async_add(batch)

print("Documents successfully exported to LanceDB!")

After export, query your data using LlamaIndex or the LanceDB client.

Deploy RAG without LlamaIndex

While Kiln is designed for deploying to LlamaIndex, you don't need to use it. The iter_llama_index_nodes returns a TextNode object which includes all the data you need to build a RAG index in any stack: embedding, text, document name, chunk ID, etc.

Full API Reference

The library can do a lot more than the examples we've shown here.

See the full API reference in the docs under the Submodules section of the sidebar.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kiln_ai-0.23.0.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kiln_ai-0.23.0-py3-none-any.whl (559.6 kB view details)

Uploaded Python 3

File details

Details for the file kiln_ai-0.23.0.tar.gz.

File metadata

  • Download URL: kiln_ai-0.23.0.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.3

File hashes

Hashes for kiln_ai-0.23.0.tar.gz
Algorithm Hash digest
SHA256 fa1c059bd76a5f53ec6000c106763a68108f3f944597b1db8712d50c7feec7a9
MD5 82b6972f6200065732c65108d3bb73c9
BLAKE2b-256 adfc4d1f849003b881db93bd26aa54844ecdffa2555ee13d89de35247d9f1584

See more details on using hashes here.

File details

Details for the file kiln_ai-0.23.0-py3-none-any.whl.

File metadata

  • Download URL: kiln_ai-0.23.0-py3-none-any.whl
  • Upload date:
  • Size: 559.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.3

File hashes

Hashes for kiln_ai-0.23.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3ad6d6c146b49ce13dd1dac079376852515453e4019aa0e4af050eb3c201b6b
MD5 42384b6fb48025faf1237e63bb962889
BLAKE2b-256 8f7b8bae44c25cb3d50a236ef50d922153fe280c77b1ed4c29a7375767d3067b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page