Skip to main content

Kiji Inspector

Project description

Kiji Inspector: Mechanistic Interpretability for AI Agent Tool Selection

Kiji Inspector

CI Core CI Extras License: Apache 2.0 GitHub Stars GitHub Issues

Python Version

Responsible AI Contributions Welcome PRs Welcome

Status

This project is under heavy active development. We are planning to release a stable version of the framework in the coming weeks.

In the meantime, join our Slack Community

Learn more about our approach and early results:


What This Project Does

This project trains Sparse Autoencoders (SAEs) on the internal activations of an AI agent to understand why it selects specific tools. Given a user request like "Search our docs for API limits," the agent must choose between tools (e.g., internal_search vs web_search). We extract the model's hidden representations at the moment of that decision, decompose them into interpretable features using a JumpReLU SAE, and validate the resulting explanations through automated fuzzing and causal ablation experiments.

The key insight: train the SAE on raw activations (not difference vectors), then use contrastive pairs post-hoc to identify which learned features correspond to specific tool-selection decisions. This preserves the SAE's general feature dictionary while enabling targeted analysis of decision-relevant features.

Install

For loading and running pretrained SAEs:

pip install kiji-inspector

For the full extraction, training, and analysis workflow:

pip install 'kiji-inspector[train]'

kiji-inspector[full] is also available as an alias for the same full stack.

Quick Start

from kiji_inspector import SAE

sae, feature_descriptions = SAE.from_pretrained(
    base_model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
    layer=20,
)

features = sae.encode(activations)
reconstruction = sae.decode(features)

Training and data-generation entrypoints live under the package namespace:

python -m kiji_inspector.generate_pairs 1300
python -m kiji_inspector.pipeline --layers 10 20 30

Local vLLM patches

For local experiments that require the custom vllm extraction changes, rebuild the environment and apply the patch set from the repository root:

uv sync --no-cache --refresh --extra full --group dev
./patches/apply-patch.sh

The apply script installs every *.patch file under patches in lexical order:

  • 01_allow_extract_hidden_states.patch
  • 02_support_nemotron_models.patch
  • 03_support_gemma3_models.patch

Additional workflow details live in patches/README_PATCH.md.


🤝 Contributing

We welcome contributions! Whether you're fixing a bug, improving documentation, or proposing a new feature, your help is appreciated.

Ways to Contribute

  • Report Bugs - Open an issue with steps to reproduce
  • Improve Docs - Documentation PRs are always welcome
  • Submit Features - Open an issue to discuss your idea before submitting a PR
  • Share Feedback - Start a discussion

Community


📄 License

Copyright (c) 2026 Dataiku SAS

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kiji_inspector-0.5.0rc0.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kiji_inspector-0.5.0rc0-py3-none-any.whl (114.1 kB view details)

Uploaded Python 3

File details

Details for the file kiji_inspector-0.5.0rc0.tar.gz.

File metadata

  • Download URL: kiji_inspector-0.5.0rc0.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for kiji_inspector-0.5.0rc0.tar.gz
Algorithm Hash digest
SHA256 6cf2e8c0be7be91301cc5fdcbf7b687755d38ffff9bc6ae64fa13d9d0b17a128
MD5 71352d2ac4de87a775f17a1054e5b7af
BLAKE2b-256 290e74351db9dad489a74d14f3cca8ad1277af2613792aa438cb574f82ab80f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for kiji_inspector-0.5.0rc0.tar.gz:

Publisher: publish-kiji-inspector.yml on dataiku/kiji-inspector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kiji_inspector-0.5.0rc0-py3-none-any.whl.

File metadata

File hashes

Hashes for kiji_inspector-0.5.0rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 19dcbcdfc6810e8ab72c139c4334d622e609ef69dc6e2cb239f7ac29cdbf0730
MD5 7a67af3515abfb5b4a4fc779f5bd60f4
BLAKE2b-256 08529a570c8648fee5a54afcea20231329df74c9edf318dea97a336d2ca77111

See more details on using hashes here.

Provenance

The following attestation bundles were made for kiji_inspector-0.5.0rc0-py3-none-any.whl:

Publisher: publish-kiji-inspector.yml on dataiku/kiji-inspector

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page