Skip to main content

A versatile kit for training and using linear probes on neural network activations.

Project description

Probes

A lightweight, modular library for training linear probes and steering vectors on neural network activations.

Core Design (V2)

This library separates Semantics (the probe model) from Fitting (how it's learned).

1. The Models: LinearProbe and ProbeCollection

  • LinearProbe (probes.core.probe): A container for a single probe (+ normalization stats).
  • ProbeCollection (probes.core.collection): A container for a batch of probes.
    • to_tensor(): Stacks weights into [B, D] and biases into [B].
    • best_layer(metric): Finds the probe with the best validation accuracy.

2. The Fitters

Functional solvers in probes.fitters take training data and return a LinearProbe (or ProbeCollection).

  • fit_logistic: Standard L2-regularized Logistic Regression.
  • fit_elastic_net: ElasticNet (L1 + L2), useful for sparse features (SAEs, Neurons).
  • fit_dim: Difference-in-Means (Class 1 Mean - Class 0 Mean).

Batched GPU Fitters

Optimized PyTorch implementations in probes.fitters.batch handle 3D inputs [B, N, D] efficiently on GPU:

  • fit_logistic_batch: Batched IRLS solver.
  • fit_dim_batch: Vectorized DiM with median thresholding.
  • fit_elastic_net_path: Efficiently fits a regularization path (multiple alphas) using warm-starting.

Quick Start

The high-level API automatically routes based on the input dimensions:

from probes import sae_probe, dim_probe

# 1. Single Probe (X: [N, D], y: [N])
probe = sae_probe(X_2d, y_1d)

# 2. Batched Probes (X: [B, N, D], y: [B, N] or [N])
# Automatically uses GPU fitters and returns a ProbeCollection
probes = sae_probe(X_3d, y)
weights, biases = probes.to_tensor() # [B, D], [B]

Steering Vectors

You can build steering vectors for individual probes or entire collections:

from probes import build_steering_vector, build_steering_vectors

# Single
vec = build_steering_vector(probe, sae_model, layer=10)

# Batched (Maps layers to probes)
vecs = build_steering_vectors(probe_collection, sae_model, layers=[8, 9, 10])

Structure

  • probes/core/: LinearProbe and ProbeCollection definitions.
  • probes/fitters/:
    • logistic.py, elastic.py, dim.py: Single-probe (CPU/sklearn) fitters.
    • batch/: Optimized GPU-batched fitters (IRLS, ISTA, DiM).
  • probes/api.py: High-level aliases and dimension routing.
  • probes/steering/: Tools for building steering vectors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

probekit-0.0.1rc4.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

probekit-0.0.1rc4-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file probekit-0.0.1rc4.tar.gz.

File metadata

  • Download URL: probekit-0.0.1rc4.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for probekit-0.0.1rc4.tar.gz
Algorithm Hash digest
SHA256 02fa4ed13de09900b3d684895135ea14e82dfca317d74b79410b68ff0537728c
MD5 b4f18c49cf681379c0f5756aaa5eaf00
BLAKE2b-256 565d394a111f5262249bc16dd5b8e0b84cca0e23f9eaa219ae2f3425bc6e4f1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for probekit-0.0.1rc4.tar.gz:

Publisher: publish.yml on ZuiderveldTimJ/probekit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file probekit-0.0.1rc4-py3-none-any.whl.

File metadata

  • Download URL: probekit-0.0.1rc4-py3-none-any.whl
  • Upload date:
  • Size: 32.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for probekit-0.0.1rc4-py3-none-any.whl
Algorithm Hash digest
SHA256 2bba8acb501de398bbc679e06f75c79d71e6d60ea2b69a3a48748952aca3f6d1
MD5 415d99a9128d7328a27324b0380d6be7
BLAKE2b-256 33cbc1f2041cd487ba85fdca295e25c3d408461b522bd25ae7d15f6832c3ed03

See more details on using hashes here.

Provenance

The following attestation bundles were made for probekit-0.0.1rc4-py3-none-any.whl:

Publisher: publish.yml on ZuiderveldTimJ/probekit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page