Lightweight dataflow library for mechanistic interpretability.

These details have not been verified by PyPI

Project description

Krnel-graph

A lightweight Python library for building strongly typed content-addressable computation graphs, especially for mechanistic interpretability research.

Think of Krnel-graph as "git for ML data transformations" - every operation has a content hash of its parameters and dependencies. Results are cached and you can reproduce any computation exactly. Because the graph is strongly typed, all operations are serializable and easily discoverable, by you, your editor, and the agents you use.

Krnel-graph separates specification from implementation. Each operation's definition contains everything needed to materialize that operation, and each Runner can implement each operation differently. This lets you swap in different backends, dataflow executors, orchestrators, etc.

TODO pretty figure, showing:
- a nice graph, including custom ops
- a HuggingfaceRunner() underneath
- a NVidiaNemoRunner()
- notebook w/ experiment results

We've tested krnel-graph on the following platforms:

MacOS (arm64, MPS)
Linux (amd64, CUDA)
Windows native (amd64, CUDA)
Windows WSL2 (amd64, CUDA)

Quick start

Installation from PyPI with uv:

$ uv add krnel-graph[cli,ml]

# (Optional) Configure where Runner() saves results
# Defaults to /tmp
$ uv run krnel-graph config --store-uri /tmp/krnel/
# s3://, gs://, or any fsspec url supported

Make main.py with the following definitions:

from krnel.graph import Runner
runner = Runner()

# Load data
ds_train   = runner.from_parquet('data_train.parquet')
col_prompt = ds_train.col_text("prompt")
col_label  = ds_train.col_categorical("label")

# Get activations from a small model
X_train = col_prompt.llm_layer_activations(
    model="hf:gpt2",
    layer=-1,
)

# Train a probe on contrastive examples
train_positives = col_label.is_in({"positive_label_1", "positive_label_2"})
train_negatives = ~train_positives
probe = X_train.train_classifier(
    positives=train_positives,
    negatives=train_negatives,
)

# Get test activations by substituting training set with testing set
# (no need to repeat the entire graph)
ds_test = runner.from_parquet('data_test.parquet')
X_test = X_train.subs((ds_train, ds_test))

test_scores = probe.predict(X_test)
eval_result = test_scores.evaluate(
    gt_positives=train_positives.subs((ds_train, ds_test)),
    gt_negatives=train_negatives.subs((ds_train, ds_test)),
)

if __name__=="__main__":
    # All operations are lazily evaluated until materialized:
    print(runner.to_json(eval_result))

Then, inspect the results in a notebook:

from main import runner, eval_result, X_train

# Materialize everything and print result:
print(runner.to_json(eval_result))

# Display activations of training set (GPU-intense operation)
print(runner.to_numpy(X_train))

Or use the (completely optional) krnel-graph CLI to materialize a selection of operations and/or monitor progress:

# Run parts of the graph
$ uv run krnel-graph run -f main.py -t LLMLayerActivations   # By operation type
$ uv run krnel-graph run -f main.py -s X_train               # By Python variable name

# Show status
$ uv run krnel-graph summary -f main.py

# Diff the pseudocode of two graph operations
$ uv run krnel-graph print -f main.py -s X_train > /tmp/train.txt
$ uv run krnel-graph print -f main.py -s X_test > /tmp/test.txt
$ git diff --no-index /tmp/train.txt /tmp/test.txt

What this library is

Krnel-graph is a content-addressable dataflow library that provides:

✅ An extensible palette of mechanistic interpretability operations for training, running, and evaluating linear probes on existing datasets in batch...
- Excelent editor support via autocomplete, type hints, docstrings, etc
✅ ...alongside a reference implementation of these operations, with optional integrations to Huggingface, TransformerLens, Ollama, and other inference fabric...
✅ ...all built on top of a lightweight computation graph flow library, featuring:
- Built-in model and data provenance via automatic dependency tracking
- Cached, reproducible results through content-addressable operations
- Immutable operation specifications with deterministic UUIDs
- Fluent API for building complex data pipelines
- ML-first design with built-in support for embeddings, classifiers, and LLMs
- (Optional) Local execution with Arrow/Parquet storage (filesystem / GCS / S3 / ...)

What this library is not

❌ ...a task orchestrator like Airflow or Prefect
- No YAML templates, no Docker containers (by default)
❌ ...a distributed computing framework like Dask or Ray
- The default runner uses local-only execution for now
- Results can be saved and loaded to a remote store (NFS, GCS/S3, ...)
- Bring your own scheduling / workflow management if needed
❌ ...an experimentation or visualization tool (though it integrates nicely with notebooks and plotting libraries)

The goal of krnel-graph is to separate well-typed specifications from their implementation. Krnel-graph does not depend on particular infrastructure. All operations are separated from their implementations, so it's easy to swap in your own dataflow executor if you prefer.

Core Concepts

OpSpec: Content-Addressable Operations

Every operation in Krnel is an OpSpec - an immutable specification with a deterministic UUID:

from krnel.graph import LoadInlineJsonDatasetOp

# These two operations have identical UUIDs
op1 = LoadInlineJsonDatasetOp(data={'x': [1, 2, 3]})
op2 = LoadInlineJsonDatasetOp(data={'x': [1, 2, 3]})
assert op1.uuid == op2.uuid

Krnel uses a type-driven fluent API where each column type provides relevant methods:

dataset = LoadInlineJsonDatasetOp(data={
    'text': ['Hello', 'World'],
    'embeddings': [[0.1, 0.2], [0.3, 0.4]],
    'labels': ['A', 'B']
})

# Type-specific operations
text_col = dataset.col_text('text')           # TextColumnType
vector_col = dataset.col_vector('embeddings')  # VectorColumnType
category_col = dataset.col_categorical('labels') # CategoricalColumnType

# Chaining operations
generated_text = vector_col.train_classifier(...).predict(...some_other_vector_col ...)

Runners: Execution Engines

Runners execute your computation graphs.

from krnel.graph.runners.local_runner import LocalArrowRunner

# This runner saves into local memory:
runner = LocalArrowRunner(store_uri="memory://")

# Different output formats
arrow_table = runner.to_arrow(my_operation)
numpy_array = runner.to_numpy(my_operation)
json_data = runner.to_json(my_operation)

# The default runner can be configured via `krnel-graph config`
from krnel.graph import Runner
runner = Runner()

Writing custom operations

1. Define your operation class

from krnel.graph import OpSpec
from krnel.graph.types import TextColumnType, VectorColumnType

class MyCustomEmbeddingOp(VectorColumnType):
    """Extract embeddings using a custom model."""
    text_input: TextColumnType
    model_path: str
    max_length: int = 512

2. Implement the execution logic

from krnel.graph.runners.local_runner import LocalArrowRunner
import pyarrow as pa

# Dispatch happens by type annotation:
@LocalArrowRunner.implementation
def my_custom_embedding_impl(runner, op: MyCustomEmbeddingOp):
    """Implementation that gets called when this op is executed."""

    # Get input data
    text_data = runner.to_arrow(op.text_input)
    texts = text_data.column(0).to_pylist()

    # Your custom logic here
    embeddings = []
    for text in texts:
        # Load your model, extract embeddings, etc.
        embedding = extract_embedding(text, op.model_path, op.max_length)
        embeddings.append(embedding)

    runner.write_arrow(op, pa.array(embeddings))

def extract_embedding(text: str, model_path: str, max_length: int):
    # Your embedding extraction logic
    return [0.1, 0.2, 0.3]  # placeholder
    ...

3. Use your custom operation

dataset = LoadInlineJsonDatasetOp(data={'text': ['Hello world', 'Custom ops!']})
text_col = dataset.col_text('text')

# Using your custom operation
embeddings = text_col.my_custom_embedding(
    model_path='./my-model',
    max_length=256
)

result = runner.to_numpy(embeddings)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.8

Feb 23, 2026

0.1.7

Oct 17, 2025

This version

0.1.6

Oct 1, 2025

0.1.5 yanked

Oct 1, 2025

Reason this release was yanked:

this shouldn t have been published

0.1.4

Oct 1, 2025

0.1.3

Oct 1, 2025

0.1.2

Oct 1, 2025

0.1.1 yanked

Sep 16, 2025

Reason this release was yanked:

we are still working on the release

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krnel_graph-0.1.6.tar.gz (78.4 kB view details)

Uploaded Oct 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

krnel_graph-0.1.6-py3-none-any.whl (62.7 kB view details)

Uploaded Oct 1, 2025 Python 3

File details

Details for the file krnel_graph-0.1.6.tar.gz.

File metadata

Download URL: krnel_graph-0.1.6.tar.gz
Upload date: Oct 1, 2025
Size: 78.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for krnel_graph-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`91a1fb7b9e24f0a8c3cbc92d1642793561a4791730289b402d01d1954062b7f8`
MD5	`fae3fa4baa3c8378be8575e61a8613ff`
BLAKE2b-256	`163b1406cccae346bc006a8018f23231cfa3016c1b5aec4e1401842b0f0cd2b3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for krnel_graph-0.1.6.tar.gz:

Publisher: publish-to-pypi.yml on krnel-ai/krnel-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: krnel_graph-0.1.6.tar.gz
- Subject digest: 91a1fb7b9e24f0a8c3cbc92d1642793561a4791730289b402d01d1954062b7f8
- Sigstore transparency entry: 576765863
- Sigstore integration time: Oct 1, 2025
Source repository:
- Permalink: krnel-ai/krnel-graph@df21327a29a65209b7b2187f2f16781d4dc8f15c
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/krnel-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@df21327a29a65209b7b2187f2f16781d4dc8f15c
- Trigger Event: release

File details

Details for the file krnel_graph-0.1.6-py3-none-any.whl.

File metadata

Download URL: krnel_graph-0.1.6-py3-none-any.whl
Upload date: Oct 1, 2025
Size: 62.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for krnel_graph-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c1460daf6ba381f36c2d970995ef4cbfad3af518daf9cf30a6325d1b3663792d`
MD5	`5ec8f102f6e5a51f796c415fd58edaed`
BLAKE2b-256	`3580259d7bab1866d36fcf0c26cc926de87617f2747b64df5018c0db7ec76eb3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for krnel_graph-0.1.6-py3-none-any.whl:

Publisher: publish-to-pypi.yml on krnel-ai/krnel-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: krnel_graph-0.1.6-py3-none-any.whl
- Subject digest: c1460daf6ba381f36c2d970995ef4cbfad3af518daf9cf30a6325d1b3663792d
- Sigstore transparency entry: 576765884
- Sigstore integration time: Oct 1, 2025
Source repository:
- Permalink: krnel-ai/krnel-graph@df21327a29a65209b7b2187f2f16781d4dc8f15c
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/krnel-ai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@df21327a29a65209b7b2187f2f16781d4dc8f15c
- Trigger Event: release

krnel-graph 0.1.6

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Krnel-graph

Quick start

What this library is

What this library is not

Core Concepts

OpSpec: Content-Addressable Operations

Runners: Execution Engines

Writing custom operations

1. Define your operation class

2. Implement the execution logic

3. Use your custom operation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance