INterpretability Interchange Format for tokenized LLM generation traces
Project description
INterpretability Interchange Format (INIF)
This package is in early development and the API may change without deprecation. Feedback and contributions are very welcome!
The INterpretability Interchange Format (INIF) is a JSON-based format for tokenized LLM generation traces with support for tagging, position selection, and support for efficient storing interpretability outputs.
Designed as the interchange layer between generation and evaluation frameworks (e.g. Inspect AI) and interpretability tools in the NDIF ecosystem (nnsight, nnterp and workbench).
Installation
pip install inif
With Inspect AI converter support:
pip install "inif[inspect]"
Quick start
From text
from inif import load, save
from inif.converters.text import from_texts
doc = from_texts(
["The capital of France is Paris.", "Hello world!"],
tokenizer_name="gpt2",
)
save(doc, "traces.inif.json")
From Inspect AI eval logs
from inif.converters.inspect_ai import from_eval_file
doc = from_eval_file("logs/my_eval.eval")
Viewing
from inif import load, show, save_html
doc = load("traces.inif.json")
show(doc) # in Jupyter
save_html(doc, "traces.html") # self-contained HTML
CLI
# Convert text files to inif
inif convert txt input.txt -m gpt2
# Convert Inspect AI eval logs
inif convert eval logs/my_eval.eval
# View as interactive HTML in the browser
inif view traces.inif.json
Format overview
An .inif.json file contains:
InifDocument
├── metadata — model info, source eval, packages, timestamps
├── sequences[] — deduplicated token patterns shared across samples
└── samples[] — tokenized generation traces
├── tokens[] — token id + string, plus extra fields (tags, role, logprob, data, ...)
├── texts[] — original message strings
├── spans[] — named position ranges
└── scores[] — evaluation scores (scorer, value, answer)
Token ID convention: id >= 0 is a vocabulary token, id == -N references a shared Sequence via sequence_id.
Extensible tokens: tags, logprobs, chat roles, and interpretability outputs (logit lens, probes, etc.) are stored as extra fields on each token.
Key features
Tagging
from inif import tag_by_regex_all, tag_by_text_regex, create_span_from_tag
# Tag tokens matching a regex pattern
tag_by_regex_all(doc, r"^\d+$", "number")
# Tag by concatenated text (multi-token matches)
tag_by_text_regex(sample, r"Paris", "city")
# Convert tags to named spans
create_span_from_tag(sample, "city", "answer_span")
Selection
from inif import select_by_tag, select_by_span, select_by_position
selection = select_by_tag(sample, "number")
selection = select_by_span(sample, "answer_span")
selection = select_by_position(sample, slice(5, 10))
Sequence deduplication
Common token sequences across samples (e.g. shared system prompts) are automatically deduplicated via set-intersection and stored as Sequence objects referenced by tokens.
from inif import deduplicate_sequences, expand_sequences
deduplicate_sequences(doc, min_length=3) # compress
expand_sequences(doc) # flatten back
Interactive HTML viewer
save_html / show produce a self-contained HTML page with:
- Collapsible sidebar with sample list and pass/fail indicators
- Token-level display with hover tooltips showing all extra fields
- Toggleable role and tag highlighting with color legends
- Span border annotations and extra-field underline indicators
- Newline-aware token wrapping
Development
make dev # install dev environment
make test # run tests
make format # ruff format
make lint # ruff check
make typecheck # ty check
make schema # regenerate JSON schema
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file inif-0.0.1.tar.gz.
File metadata
- Download URL: inif-0.0.1.tar.gz
- Upload date:
- Size: 255.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc83bb66f5374621f8ddbe0ccdc10f565fdad450909b1284e07a30f1d6df1da3
|
|
| MD5 |
c3c5745a36b4ebb73d81a472d2b92379
|
|
| BLAKE2b-256 |
a79f50b10257cfca7f180bba2ed68871103b69ad2502202b43fa55c453bd903a
|
File details
Details for the file inif-0.0.1-py3-none-any.whl.
File metadata
- Download URL: inif-0.0.1-py3-none-any.whl
- Upload date:
- Size: 44.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04e4dd7be12a1190975c889ac3ae8e15d448b11ca7ff710dd177cf48b622cc05
|
|
| MD5 |
a9cb2106cf50028f298279d2f94d4f28
|
|
| BLAKE2b-256 |
25a7d4fec4e6f1d11ff90391e15b385dbf3def1779e66cba997e3e4d63d2a8b8
|