Skip to main content

Kiji Inspector

Project description

Kiji Inspector: Mechanistic Interpretability for AI Agent Tool Selection

Kiji Inspector

CI Core CI Extras License: Apache 2.0 GitHub Stars GitHub Issues

Python Version

Responsible AI Contributions Welcome PRs Welcome

Status

This project is under heavy active development. We are planning to release a stable version of the framework in the coming weeks.

In the meantime, join our Slack Community

Learn more about our approach and early results:


What This Project Does

This project trains Sparse Autoencoders (SAEs) on the internal activations of an AI agent to understand why it selects specific tools. Given a user request like "Search our docs for API limits," the agent must choose between tools (e.g., internal_search vs web_search). We extract the model's hidden representations at the moment of that decision, decompose them into interpretable features using a JumpReLU SAE, and validate the resulting explanations through automated fuzzing and causal ablation experiments.

The key insight: train the SAE on raw activations (not difference vectors), then use contrastive pairs post-hoc to identify which learned features correspond to specific tool-selection decisions. This preserves the SAE's general feature dictionary while enabling targeted analysis of decision-relevant features.

Install

For loading and running pretrained SAEs:

pip install kiji-inspector

For the full extraction, training, and analysis workflow:

pip install 'kiji-inspector[train]'

kiji-inspector[full] is also available as an alias for the same full stack.

Quick Start

from kiji_inspector import SAE

sae, feature_descriptions = SAE.from_pretrained(
    base_model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
    layer=20,
)

features = sae.encode(activations)
reconstruction = sae.decode(features)

Training and data-generation entrypoints live under the package namespace:

python -m kiji_inspector.generate_pairs 1300
python -m kiji_inspector.pipeline --layers 10 20 30

Local vLLM patches

For local experiments that require the custom vllm extraction changes, rebuild the environment and apply the patch set from the repository root:

uv sync --no-cache --refresh --extra full --group dev
./patches/apply-patch.sh

The apply script installs every *.patch file under patches in lexical order:

  • 01_allow_extract_hidden_states.patch
  • 02_support_nemotron_models.patch
  • 03_support_gemma3_models.patch

Additional workflow details live in patches/README_PATCH.md.


🤝 Contributing

We welcome contributions! Whether you're fixing a bug, improving documentation, or proposing a new feature, your help is appreciated.

Ways to Contribute

  • Report Bugs - Open an issue with steps to reproduce
  • Improve Docs - Documentation PRs are always welcome
  • Submit Features - Open an issue to discuss your idea before submitting a PR
  • Share Feedback - Start a discussion

Community


📄 License

Copyright (c) 2026 Dataiku SAS

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kiji_inspector-0.5.0rc2.tar.gz (103.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kiji_inspector-0.5.0rc2-py3-none-any.whl (115.0 kB view details)

Uploaded Python 3

File details

Details for the file kiji_inspector-0.5.0rc2.tar.gz.

File metadata

  • Download URL: kiji_inspector-0.5.0rc2.tar.gz
  • Upload date:
  • Size: 103.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kiji_inspector-0.5.0rc2.tar.gz
Algorithm Hash digest
SHA256 497fc6c39e1e3f940dbc56f96c56fd9eec995d9a1163d7eea0a52cdd433ee0aa
MD5 ee80492a8162cc70b79600c08a9da660
BLAKE2b-256 a393009e814b17fc5177faceb82400a802c8bf09edb830b4fe5caf9086e6fbd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for kiji_inspector-0.5.0rc2.tar.gz:

Publisher: publish-kiji-inspector.yml on dataiku/kiji-inspector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kiji_inspector-0.5.0rc2-py3-none-any.whl.

File metadata

File hashes

Hashes for kiji_inspector-0.5.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 3a9388afb44ffc37668d5bcead1fc373ec982e384f98724d8d3d1bd3de8bb1be
MD5 9d73dc5b62abaaa76127ddc536c3c7db
BLAKE2b-256 1cc54deb948d48e41db57f9795d2bbae25fec1963e6b04d600147cb0d14a6453

See more details on using hashes here.

Provenance

The following attestation bundles were made for kiji_inspector-0.5.0rc2-py3-none-any.whl:

Publisher: publish-kiji-inspector.yml on dataiku/kiji-inspector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page