Skip to main content

Lightweight dataflow library for mechanistic interpretability.

Project description

Krnel-graph

DocsExamplesGithubPyPI

A Python toolbox for mechanistic interpretability research built on a lightweight strongly-typed computation graph spec.

  • Run language models using HuggingFace Transformers, TransformerLens, Ollama, etc., and save activations from the residual stream
  • Train linear probes from cached activations and evaluate their results
  • Fetch logit scores for guardrail models
  • Load and prepare datasets

Applications

  • Build better guardrails using linear probes that understand model internals
  • Explore large datasets grouped by semantic similarity
  • Vizualize high-dimensional embeddings with built-in UMAP scatterplots
  • Evaluate derivative experiments quickly with full caching and provenance tracking of results.
  • Infrastructure-agnostic: Run in a notebook, on your GPU machine's CLI, or via the task orchestration framework of your choice!

Krnel-graph figure

Quick start

Krnel-graph works on the following platforms:

  • MacOS (arm64, MPS, Apple M1 or better)
  • Linux (amd64, CUDA)
  • Windows native (amd64, CUDA)
  • Windows WSL2 (amd64, CUDA)

Install from PyPI with uv:

$ uv add krnel-graph[cli,ml]

# (Optional) Configure where Runner() saves results
# Defaults to /tmp
$ uv run krnel-graph config --store-uri /tmp/krnel/
# s3://, gs://, or any fsspec url supported

Make main.py with the following definitions:

from krnel.graph import Runner
runner = Runner()

# Load data
ds_train   = runner.from_parquet('data_train.parquet')
col_prompt = ds_train.col_text("prompt")
col_label  = ds_train.col_categorical("label")

# Get activations from a small model
X_train = col_prompt.llm_layer_activations(
    model="hf:gpt2",
    layer=-1,
)

# Train a probe on contrastive examples
train_positives = col_label.is_in({"positive_label_1", "positive_label_2"})
train_negatives = ~train_positives
probe = X_train.train_classifier(
    positives=train_positives,
    negatives=train_negatives,
)

# Get test activations by substituting training set with testing set
# (no need to repeat the entire graph)
ds_test = runner.from_parquet('data_test.parquet')
X_test = X_train.subs((ds_train, ds_test))

test_scores = probe.predict(X_test)
eval_result = test_scores.evaluate(
    gt_positives=train_positives.subs((ds_train, ds_test)),
    gt_negatives=train_negatives.subs((ds_train, ds_test)),
)

if __name__=="__main__":
    # All operations are lazily evaluated until materialized:
    print(runner.to_json(eval_result))

Then, inspect the results in a notebook:

from main import runner, eval_result, X_train

# Materialize everything and print result:
print(runner.to_json(eval_result))

# Display activations of training set (GPU-intense operation)
print(runner.to_numpy(X_train))

Or use the (completely optional) krnel-graph CLI to materialize a selection of operations and/or monitor progress:

# Run parts of the graph
$ uv run krnel-graph run -f main.py -t LLMLayerActivations   # By operation type
$ uv run krnel-graph run -f main.py -s X_train               # By Python variable name

# Show status
$ uv run krnel-graph summary -f main.py

# Diff the pseudocode of two graph operations
$ uv run krnel-graph print -f main.py -s X_train > /tmp/train.txt
$ uv run krnel-graph print -f main.py -s X_test > /tmp/test.txt
$ git diff --no-index /tmp/train.txt /tmp/test.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krnel_graph-0.1.7.tar.gz (83.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krnel_graph-0.1.7-py3-none-any.whl (67.4 kB view details)

Uploaded Python 3

File details

Details for the file krnel_graph-0.1.7.tar.gz.

File metadata

  • Download URL: krnel_graph-0.1.7.tar.gz
  • Upload date:
  • Size: 83.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for krnel_graph-0.1.7.tar.gz
Algorithm Hash digest
SHA256 eea70535147331b6896c869ae5415d912e3a9b5492025d318668c71ffeaf52d3
MD5 0483347da436ac9403ec491f437ac335
BLAKE2b-256 415fb5ecc1ad233c7b300f367e05329fa36e98a401851260e59ed2f90f8c9dd1

See more details on using hashes here.

Provenance

The following attestation bundles were made for krnel_graph-0.1.7.tar.gz:

Publisher: publish-to-pypi.yml on krnel-ai/krnel-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file krnel_graph-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: krnel_graph-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 67.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for krnel_graph-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 3afec1ca36b8d3350b23d163a0f633e5c3b5a7df0fa1518fb8199380bd9d89b2
MD5 1b24ddf80b915cc614bbcbdd69066813
BLAKE2b-256 76d6b6cb687573536a2d3c7bac63885dfda601452397dff7b8f50740be356e1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for krnel_graph-0.1.7-py3-none-any.whl:

Publisher: publish-to-pypi.yml on krnel-ai/krnel-graph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page