Information Retrieval

These details have not been verified by PyPI

Project links

Project description

ir

An information-retrieval substrate for agentic systems — one uniform "find the relevant things in this corpus" contract that scales from an ad-hoc search over an ephemeral list to a maintained capability-discovery engine.

Give an agent one search tool, not fifty tool schemas. ir retrieves candidates, commits to a small high-precision subset (the distractor problem is the central selection risk — fewer, better candidates beat more), and discloses each committed item's payload only when asked.

import ir

# Define a corpus, build the index (incremental), then discover:
source = ir.CorpusSource.from_skills()       # or from_packages(), from_md_reports(), from_files(...)
corpus = ir.build(source)                     # embed + persist under XDG dirs
result = ir.discover(corpus, "how do I deploy the app to the server")

for item in result.results:
    print(item.score, item.name)              # the committed few (or result.abstained)
print(result.to_dict())                       # JSON-serializable (qh / HTTP ready)

The pipeline

ir is a five-stage pipeline, each stage a small, swappable seam:

Stage	Entry point	What it does
source	`CorpusSource`	what is in the corpus + what counts as stale
index	`ir.build`	decompose artifacts into embeddable surfaces, embed, persist (incremental, idempotent)
retrieve	`ir.search`	hard metadata filter + `dense` / `lexical` / `hybrid` ranking
select	`ir.select`	commit to a distractor-robust subset, or abstain
disclose	`ir.disclose`	load the heavy payload (SKILL.md body, package pointer, file text) for committed items — append-only

ir.discover chains retrieve → select → disclose into the single agent-callable (and qh-exposable) tool.

Retrieve

hits = ir.search(corpus, "deploy app", mode="hybrid")   # dense | lexical | hybrid (RRF)

Dense is exact brute-force cosine; lexical is Okapi BM25; hybrid fuses both by Reciprocal Rank Fusion (the strongest default for short, identifier-heavy capability text). Lexical/hybrid reuse vd; dense needs only numpy.

Select

sel = ir.select(hits)                      # conservative default: stay within rel of top, cap at max_k
sel = ir.select(hits, min_score=0.4)       # opt in to abstention ("nothing applies")
sel = ir.select(hits, strategy="score_gap")  # elbow cut, or "top_k" / "rel_threshold" / a callable

The conservative defaults (max_k=3, rel=0.9) are tuned, not guessed — see ir_06; re-tune for your own corpus with ev.sweep_selector / ir sweep-select.

Selection is relative (ratios to the top score), so one selector works across dense / hybrid / lexical whose absolute scales differ by orders of magnitude. The result carries auditable signals and a reason — no opaque "confidence" float. An optional LLM selector (make_llm_selector, lazy on oa, injectable for tests) falls back to the heuristic on any failure.

Disclose

payloads = ir.disclose(sel, level="body")  # "metadata" (no I/O) | "body" | "bundled"

Disclosure is a pure read that follows the pointer already stored on each hit (skill_path / path); it never mutates the ranked hits and tolerates a stale pointer. Keeping the agent's context append-only (to protect the prompt cache) is then the caller's discipline — ir hands back additive payloads.

Evaluation

ir.eval scores discovery quality offline (reusing ef's retrieval metrics):

from ir import eval as ev

cases = ev.load_cases("skills_eval.jsonl")               # query + gold artifact_id(s)
ev.evaluate_discovery(corpus, cases, mode="hybrid")      # recall@k / NDCG@k / MRR / MAP + failure taxonomy
ev.evaluate_selection(corpus, cases, strategy="conservative")  # conditional commit rate + selection P/R/F1
ev.sweep_selector(corpus, cases)                         # tune max_k × rel; .best() / .frontier() / .table()
ev.distractor_robustness_curve(source.scope, probes)     # accuracy vs catalog size

evaluate_selection's headline is the conditional commit rate — the selection decision isolated from retrieval (did the selector keep the gold, given retrieval surfaced it?). sweep_selector scores a whole max_k × rel grid against the cases off one retrieval pass, so the selector defaults can be read off the data (.best()) rather than guessed. Generate cases by back-translation with ir.eval_gen (needs an LLM; scoring stays offline).

CLI

ir build skills                          # build/update a preset corpus
ir discover skills "deploy the app"      # retrieve -> select
ir discover skills "deploy the app" --disclose   # + load bodies
ir eval-select skills skills_eval.jsonl  # score the selection stage
ir sweep-select skills skills_eval.jsonl # tune the selector (max_k × rel) on your corpus
ir ls                                    # list corpora

Design

The design is grounded in a set of capability-discovery research reports under misc/docs/ (ir_01–ir_05): the single-search-tool pattern, indexing & embedding strategy, evaluation, the ef + vd reuse analysis, and a dense-vs- lexical-vs-hybrid eval run. ir is light by default (numpy / dol) and reuses the ecosystem (ef, vd, oa) only where it composes cleanly.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.22

Jun 13, 2026

0.1.21

Jun 13, 2026

0.1.20

Jun 13, 2026

0.1.19

Jun 12, 2026

0.1.18

Jun 12, 2026

0.1.17

Jun 12, 2026

0.1.16

Jun 12, 2026

0.1.15

Jun 11, 2026

0.1.14

Jun 11, 2026

0.1.13

Jun 11, 2026

0.1.12

Jun 11, 2026

0.1.11

Jun 11, 2026

0.1.10

Jun 11, 2026

0.1.9

Jun 11, 2026

This version

0.1.8

Jun 7, 2026

0.1.7

Jun 6, 2026

0.1.6

Jun 6, 2026

0.1.5

Jun 6, 2026

0.1.4

Jun 6, 2026

0.1.3

Jun 6, 2026

0.1.2

Jun 5, 2026

0.1.1

Jun 5, 2026

0.0.6

May 23, 2025

0.0.5

May 22, 2025

0.0.4

Oct 10, 2022

0.0.3

Oct 3, 2022

0.0.2

Jan 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ir-0.1.8.tar.gz (137.7 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ir-0.1.8-py3-none-any.whl (61.6 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file ir-0.1.8.tar.gz.

File metadata

Download URL: ir-0.1.8.tar.gz
Upload date: Jun 7, 2026
Size: 137.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ir-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`ea8bc97d01b12cf63031d74cc029e3bf729df3d08decb7fd1d9f4185c934a472`
MD5	`79fedfae7c42c6f742e782c3d9695f98`
BLAKE2b-256	`628262804d126fc341723ebbf80c8dda7c7d921832fe6461f812a1537d582ada`

See more details on using hashes here.

File details

Details for the file ir-0.1.8-py3-none-any.whl.

File metadata

Download URL: ir-0.1.8-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 61.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for ir-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4c8726f207850237c9bbd865273b4f875f947a7f578527599d28e944f4aed257`
MD5	`0f53e1e5379025083e1242a469588349`
BLAKE2b-256	`8cc90733969c8d34fe8bafab0ea7147bf439299fd8ebc204590bafd0489ef7c6`

See more details on using hashes here.

ir 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ir

The pipeline

Retrieve

Select

Disclose

Evaluation

CLI

Design

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes