Mojo-first late-interaction retrieval engine

Project description

kayak

kayak is a Mojo-first late-interaction retrieval engine.

For developers using the Python SDK, the supported entrypoint is import kayak. The current monorepo keeps the Python SDK and Mojo engine together on purpose. The documented public SDK boundary is narrower than the full repo surface.

The current scaffold is intentionally narrow:

encoder output is treated as an external boundary
indexing and exact MaxSim scoring live in Mojo
CPU exact search is the first verified path
benchmarks and tests are first-class, not an afterthought

Current Layout

kayak/contracts/: validated query/document contracts
kayak/numeric/: centralized scalar aliases and storage-format constants
kayak/index/: packed index layout and optional flat dim128 document layouts
kayak/scoring/: exact MaxSim scoring kernels
kayak/runtime/: backend boundary, CPU backend first
kayak/search/: top-k search orchestration
kayak/verifier/: optional candidate-window reranking and verifier pipeline
kayak/benchmarks/: deterministic workload profiles and proxy benchmark tasks
kayak/eval/: judged tasks and lightweight retrieval metrics
kayak/interop/: Python bridge for external encoders and real public subsets
kayak/storage/: persisted judged tasks, packed indexes, and optional derived layouts
benchmarks/: runnable benchmark entrypoints
python/: Python bridge modules, explicit late-interaction objects, and a reference exact backend
tests/: runnable unit-test entrypoints using std.testing.TestSuite

Why This Shape

This layout is justified by the current project goal:

keep the encoder boundary swappable for MAX or another transformer stack
keep the retrieval core in Mojo
keep scalar choices centralized so vector/score dtypes can evolve without a rewrite
keep hot paths isolated so they are easy to profile and later replace with GPU kernels

The code keeps vector counts explicit because search quality and systems cost both depend on:

query vector count
document vector count
related sparse-attention or pruning budgets

Python Late-Interaction Layer

The repo now includes an additive Python-facing late-interaction layer under python/kayak_bridge/ plus a light python/kayak/ facade for import kayak. The packaging config points Python packaging at the existing python/ tree instead of introducing a second src/kayak tree on top of the repo's Mojo package layout.

What it owns:

explicit LateQuery, LateDocuments, LateIndex, and LateScores objects
explicit layout conversions for flat_dim128 queries and hybrid_flat_dim128 indexes
exact maxsim and search operations over those objects
NumPy and PyTorch input ergonomics without pretending the data is a dense B x T x D tensor problem
a pip-installable Python package surface rooted at import kayak

What it does not claim yet:

hidden approximation, implicit layout conversion, or overloaded tensor algebra
a published wheel that bundles the Mojo toolchain itself

Current backend boundary:

numpy_reference: correctness-oriented NumPy reference path
mojo_exact_cpu: Mojo-backed exact CPU scoring through a compiled Python extension module

Current packaging boundary:

pip install . from a source checkout is verified
when Mojo is available at build time, the wheel bundles kayak.mojopkg so the installed package can build the Python extension on demand
the current package still expects a local Mojo toolchain at runtime for mojo_exact_cpu
fresh-consumer testing verified numpy_reference through a local Pixi package add
fresh-consumer testing verified mojo_exact_cpu through pip install /path/to/kayak with mojo present during install
fresh-consumer testing did not verify mojo_exact_cpu through pixi add --pypi "kayak @ file://..." because that path did not bundle kayak.mojopkg

Supported public Python boundary:

import from kayak
treat kayak_bridge as internal and unstable
treat the top-level Mojo package kayak/ as engine code, not as the Python SDK

The detailed SDK boundary, install paths, and quickstarts are documented in docs/python_sdk.md.

Example:

import kayak

q = kayak.query(query_vectors)
docs = kayak.documents(["doc-a", "doc-b"], document_vectors)
index = docs.pack().to_layout("hybrid_flat_dim128")

scores = kayak.maxsim(q, docs.pack(), backend=kayak.MOJO_EXACT_CPU_BACKEND)
hits = kayak.search(
    q.to_layout("flat_dim128"),
    index,
    k=10,
    backend=kayak.NUMPY_REFERENCE_BACKEND,
)

Benchmark Coverage

The benchmark layer now has two complementary pieces:

workload profiles for system timings across benchmark families
tiny judged proxy tasks for fast retrieval-quality checks

The included slices are inspired by public benchmark families that are relevant to late interaction:

LoTTE: domain-specific forum retrieval in the ColBERT ecosystem
BEIR: heterogeneous factual retrieval across domains
MS MARCO: short passage retrieval
BRIGHT: reasoning-heavy retrieval
MIRACL: multilingual retrieval

Important epistemic boundary:

these shipped tasks are proxies, not official benchmark reproductions
they are intended to keep the code runnable, fast, and easy to scale later
official full-benchmark claims still require running the public datasets and their evaluation protocols

The source rationale for those families is recorded in docs/benchmark_rationale.md.

Profiling Benchmarks

The repo now has profiling-oriented microbenchmarks that separate:

raw dot_product
per-document exact MaxSim
backend score_all
full search_exact

These benches sweep explicit shapes so vector count stays first-class in the output. The current CPU path uses a SIMD dot_product kernel, a narrow 128-dim fast path for ColBERT-shaped embeddings, and vector-balanced document partitioning for larger exact-search workloads. Both CPU optimizations are now explicitly configurable through ExactScoringConfig, so you can disable the 128-dim fast path or document-level parallel scoring when profiling or comparing kernels. Parallel work-item oversubscription can also be disabled explicitly when you want a strict worker_count partitioning policy, and the work-item count can be overridden directly for scalability sweeps and USL fitting.

Robustness Layer

The repo now includes a small robustness layer inspired by property-first testing and mutation-quality checks:

tests/test_battle.mojo: randomized differential and metamorphic checks for the late-interaction core
tests/test_storage_invariants.mojo: corruption and compatibility checks for persisted artifacts
tests/test_score_partitions.mojo: vector-balanced partitioning checks for parallel exact scoring
tests/test_eval_battle.mojo: metric reference and evaluation invariants
python/scripts/mutation_smoke.py: curated mutation-smoke harness for core kernels, metrics, and storage guards

This is documented in docs/robustness_testing.md.

Real Subset Bridge

The real public benchmark path now uses:

colbert-ai for ColBERTv2 token embeddings on CPU
Mojo for packing, exact search, and evaluation
small BEIR/SciFact and BEIR/FIQA subsets as real benchmark slices
an official LIMIT-small slice with the full 46-document corpus and a light 32-query subset
light BrowseComp-Plus evidence and gold slices built from official decrypted queries, human-verified evidence documents, gold answer documents, and the benchmark's curated hard negatives
a query-diagnostic benchmark that scores the same BrowseComp ranking against both evidence and gold qrels
repo-local storage so repeated runs can reload encoded tasks and packed indexes

This is still a deliberate smoke-oriented public suite, not a claim of full benchmark reproduction. LoTTE remains a target, but the straightforward official loader path currently pulls a 3.58 GB archive, which is too heavy for the fast smoke workflow this repo wants. BrowseComp-Plus is also intentionally sliced:

the official benchmark uses a fixed corpus of about 100k documents and an agent loop
the repo keeps the retrieval core honest by using the benchmark's evidence docs, gold docs, and hard negatives
the repo does not yet claim full agent-benchmark reproduction inside kayak

The persisted artifacts live under:

.cache/kayak/scifact_real_subset/
.cache/kayak/fiqa_real_subset/
.cache/kayak/limit_small_real_subset/
.cache/kayak/browsecomp_plus_real_subset/

For the BrowseComp-Plus slices, the raw Python tasks are also materialized once at:

.cache/kayak/browsecomp_plus_real_subset/python_task_evidence.json
.cache/kayak/browsecomp_plus_real_subset/python_task_gold.json

The manifest records:

storage format version
vector scalar type
dataset id
model name
judged task payload
packed index payload

Storage format v2 keeps manifests and lightweight metadata in text, but stores the hot vector payloads in binary little-endian form. That is a deliberate compromise:

metadata stays easy to inspect by eye
vector payloads stop paying TSV parse and size overhead on every reload
legacy v1 text payloads still load for compatibility

The repo also supports an optional persisted hybrid_flat_dim128_index artifact. This is a derived layout for 128-dim document embeddings:

it keeps doc_ids and doc_offsets
it stores document token values as one flat scalar buffer
it is opt-in, not the default exact-search path

For the same late-interaction path, the repo also supports an optional FlatQueryDim128 query layout. This still preserves the full multi-vector query representation. It changes query memory layout only; it does not collapse retrieval into a single-vector search.

That choice is deliberate. The current measurements support keeping it as a first-class optional artifact, but they do not yet support silently replacing the default CPU search path.

Verifier Stage

The repo now includes a narrow third-stage verifier interface:

no_verifier: preserves the current exact-search path
exact_late_interaction_verifier(candidate_k): reranks a candidate window in Mojo with exact MaxSim

This stage is intentionally vector-only in v0.1. That is an epistemic boundary, not a missing buzzword: the current stored artifacts contain token embeddings and ids, but not the raw text needed for an honest cross-encoder reranker. If we want a text-level verifier later, the storage layer must first persist the necessary text payload explicitly.

Commands

Run the demo:

pixi run demo
pixi run demo_scifact
pixi run demo_fiqa
pixi run demo_python_sdk
pixi run demo_python_sdk_mojo

Run tests:

pixi run test_index
pixi run test_maxsim
pixi run test_eval
pixi run test_proxies
pixi run test_python_bridge
pixi run test_python_api
pixi run test_storage
pixi run test_storage_compat
pixi run test_storage_invariants
pixi run test_score_partitions
pixi run test_hybrid_flat_dim128
pixi run test_verifier
pixi run test_battle
pixi run test_eval_battle

Install the Python package from a source checkout:

python -m pip install .

Run the exact CPU benchmark:

pixi run bench_exact
pixi run bench_profile_exact
pixi run bench_profile_cpu_micro
pixi run bench_profile_cpu_structural
pixi run bench_profile_cpu_structural_real_subset
pixi run bench_profile_cpu_hybrid_real_subset
pixi run bench_profile_cpu_verifier_real_subset
pixi run bench_profile_storage_real_subset
pixi run bench_profile_cpu_configs
pixi run bench_profile_cpu_usl
pixi run fit_usl

For lower-noise comparisons on a busy machine, use the quiet benchmark wrapper:

bash scripts/run_bench_quiet.sh --repeats 3 --max-other-cpu 40 -- pixi run bench_scifact

The default pixi run bench_* tasks for performance-sensitive benchmarks now use this quiet wrapper automatically. Use the corresponding *_raw tasks only for quick smoke checks when you explicitly do not want quiet-run gating.

Run the workload matrix and the proxy evaluation matrix:

pixi run bench_matrix
pixi run eval_matrix
pixi run bench_scifact
pixi run bench_fiqa
pixi run bench_limit_small
pixi run bench_browsecomp_plus
pixi run bench_browsecomp_plus_gold
pixi run bench_browsecomp_plus_diag
pixi run bench_browsecomp_plus_ranks
pixi run bench_real_subset_policies
pixi run bench_real_subset_breakdown

Materialize the BrowseComp-Plus task json explicitly if you want to separate the plain-Python encoding step from the Mojo benchmark run:

pixi run build_browsecomp_plus_task_json

That command now materializes both BrowseComp-Plus retrieval variants:

evidence qrels
gold qrels

Run the curated mutation-smoke check:

pixi run mutate_smoke

Compile the package:

pixi run package_mojo

The compiled Mojo package is written to dist/kayak.mojopkg.

Project details

Release history Release notifications | RSS feed

0.3.0

Apr 15, 2026

0.1.4

Apr 12, 2026

0.1.3

Apr 12, 2026

0.1.2

Apr 12, 2026

This version

0.1.1

Apr 12, 2026

0.1.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kayak-0.1.1.tar.gz (26.3 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kayak-0.1.1-py3-none-any.whl (33.1 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file kayak-0.1.1.tar.gz.

File metadata

Download URL: kayak-0.1.1.tar.gz
Upload date: Apr 12, 2026
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.16

File hashes

Hashes for kayak-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`4a7cdc5e2dceec4738aff92f95fdbea14cd31481abf573b26e462752b5d77ce9`
MD5	`d538ce00484e3ee1f6337be8aa3e0dd2`
BLAKE2b-256	`a1063332904ea224767b8fd17934a1d1327332ba6e608ee85790b286902ab560`

See more details on using hashes here.

File details

Details for the file kayak-0.1.1-py3-none-any.whl.

File metadata

Download URL: kayak-0.1.1-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 33.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.16

File hashes

Hashes for kayak-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca250950bfd29e309bbd5f86930ae02c4e1d8e0d2ca3737a645cd04e67c4077d`
MD5	`1a8b310b786b6d36c6f429709a50d73f`
BLAKE2b-256	`8218bdaf7a7eaeb4aa4fbdc62991579c3a62eeae2287c186e128b2f42bb0ddc4`

See more details on using hashes here.

kayak 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

kayak

Current Layout

Why This Shape

Python Late-Interaction Layer

Benchmark Coverage

Profiling Benchmarks

Robustness Layer

Real Subset Bridge

Verifier Stage

Commands

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes