Mojo-first late-interaction retrieval engine
Project description
kayak
kayak is a Mojo-first late-interaction retrieval engine.
For developers using the Python SDK, the supported entrypoint is import kayak.
The current monorepo keeps the Python SDK and Mojo engine together on purpose.
The documented public SDK boundary is narrower than the full repo surface.
The current scaffold is intentionally narrow:
- encoder output is treated as an external boundary
- indexing and exact MaxSim scoring live in Mojo
- CPU exact search is the first verified path
- benchmarks and tests are first-class, not an afterthought
Current Layout
kayak/contracts/: validated query/document contractskayak/numeric/: centralized scalar aliases and storage-format constantskayak/index/: packed index layout and optional flatdim128document layoutskayak/scoring/: exact MaxSim scoring kernelskayak/runtime/: backend boundary, CPU backend firstkayak/search/: top-k search orchestrationkayak/verifier/: optional candidate-window reranking and verifier pipelinekayak/benchmarks/: deterministic workload profiles and proxy benchmark taskskayak/eval/: judged tasks and lightweight retrieval metricskayak/interop/: Python bridge for external encoders and real public subsetskayak/storage/: persisted judged tasks, packed indexes, and optional derived layoutsbenchmarks/: runnable benchmark entrypointspython/: Python bridge modules, explicit late-interaction objects, and a reference exact backendtests/: runnable unit-test entrypoints usingstd.testing.TestSuite
Why This Shape
This layout is justified by the current project goal:
- keep the encoder boundary swappable for MAX or another transformer stack
- keep the retrieval core in Mojo
- keep scalar choices centralized so vector/score dtypes can evolve without a rewrite
- keep hot paths isolated so they are easy to profile and later replace with GPU kernels
The code keeps vector counts explicit because search quality and systems cost both depend on:
- query vector count
- document vector count
- related sparse-attention or pruning budgets
Python Late-Interaction Layer
The repo now includes an additive Python-facing late-interaction layer under
python/kayak_bridge/ plus a light python/kayak/ facade for import kayak.
The packaging config points Python packaging at the existing python/ tree
instead of introducing a second src/kayak tree on top of the repo's Mojo
package layout.
What it owns:
- explicit
LateQuery,LateDocuments,LateIndex, andLateScoresobjects - explicit layout conversions for
flat_dim128queries andhybrid_flat_dim128indexes - exact
maxsimandsearchoperations over those objects - NumPy and PyTorch input ergonomics without pretending the data is a dense
B x T x Dtensor problem - a pip-installable Python package surface rooted at
import kayak
What it does not claim yet:
- hidden approximation, implicit layout conversion, or overloaded tensor algebra
- a published wheel that bundles the Mojo toolchain itself
Current backend boundary:
numpy_reference: correctness-oriented NumPy reference pathmojo_exact_cpu: Mojo-backed exact CPU scoring through a compiled Python extension module
Current packaging boundary:
pip install .from a source checkout is verified- repo-head builds now stage bundled engine sources under
kayak_bridge/_engine/kayakinside the Python distribution - when Mojo is available at build time, repo-head builds also bundle
kayak_bridge/_artifacts/kayak.mojopkg - the current package still expects a local Mojo toolchain at runtime for
mojo_exact_cpu - fresh-consumer validation on
2026-04-12verified publishedkayak 0.1.1fornumpy_referencethrough:python -m pip install kayakin a fresh Python3.11environmentuv add kayakin a fresh project constrained to Python>=3.11,<3.12pixi add --pypi kayakin a fresh Pixi project withpython=3.11
- plain
pixi add kayakdid not work because no conda package was found forkayak - the published package did not contain
kayak_bridge/_artifacts/kayak.mojopkg - because of that missing artifact,
mojo_exact_cpufailed after published installs, including in a fresh Pixi environment that already hadmojo - local repo-head validation on
2026-04-12verifieduv buildproduced:- an sdist that includes the top-level
kayak/Mojo sources - a wheel that includes both
kayak_bridge/_artifacts/kayak.mojopkgandkayak_bridge/_engine/kayak/...
- an sdist that includes the top-level
- fresh-consumer validation on
2026-04-12verifiedmojo_exact_cpufrom that locally built wheel after:pixi init .pixi add python=3.11 mojopixi run python -m ensurepip --upgradepixi run python -m pip install /path/to/kayak-<version>-py3-none-any.whl
Supported public Python boundary:
- import from
kayak - treat
kayak_bridgeas internal and unstable - treat the top-level Mojo package
kayak/as engine code, not as the Python SDK
The package-scoped Python README lives at python/kayak/README.md.
The detailed SDK boundary, install paths, and quickstarts are documented in docs/python_sdk.md.
Example:
import kayak
q = kayak.query(query_vectors)
docs = kayak.documents(["doc-a", "doc-b"], document_vectors)
index = docs.pack().to_layout("hybrid_flat_dim128")
scores = kayak.maxsim(q, docs.pack(), backend=kayak.MOJO_EXACT_CPU_BACKEND)
hits = kayak.search(
q.to_layout("flat_dim128"),
index,
k=10,
backend=kayak.NUMPY_REFERENCE_BACKEND,
)
Benchmark Coverage
The benchmark layer now has two complementary pieces:
- workload profiles for system timings across benchmark families
- tiny judged proxy tasks for fast retrieval-quality checks
The included slices are inspired by public benchmark families that are relevant to late interaction:
LoTTE: domain-specific forum retrieval in the ColBERT ecosystemBEIR: heterogeneous factual retrieval across domainsMS MARCO: short passage retrievalBRIGHT: reasoning-heavy retrievalMIRACL: multilingual retrieval
Important epistemic boundary:
- these shipped tasks are proxies, not official benchmark reproductions
- they are intended to keep the code runnable, fast, and easy to scale later
- official full-benchmark claims still require running the public datasets and their evaluation protocols
The source rationale for those families is recorded in docs/benchmark_rationale.md.
Profiling Benchmarks
The repo now has profiling-oriented microbenchmarks that separate:
- raw
dot_product - per-document exact MaxSim
- backend
score_all - full
search_exact
These benches sweep explicit shapes so vector count stays first-class in the output.
The current CPU path uses a SIMD dot_product kernel, a narrow 128-dim
fast path for ColBERT-shaped embeddings, and vector-balanced document
partitioning for larger exact-search workloads.
Both CPU optimizations are now explicitly configurable through
ExactScoringConfig, so you can disable the 128-dim fast path or
document-level parallel scoring when profiling or comparing kernels.
Parallel work-item oversubscription can also be disabled explicitly when you
want a strict worker_count partitioning policy, and the work-item count can
be overridden directly for scalability sweeps and USL fitting.
Robustness Layer
The repo now includes a small robustness layer inspired by property-first testing and mutation-quality checks:
tests/test_battle.mojo: randomized differential and metamorphic checks for the late-interaction coretests/test_storage_invariants.mojo: corruption and compatibility checks for persisted artifactstests/test_score_partitions.mojo: vector-balanced partitioning checks for parallel exact scoringtests/test_eval_battle.mojo: metric reference and evaluation invariantspython/scripts/mutation_smoke.py: curated mutation-smoke harness for core kernels, metrics, and storage guards
This is documented in docs/robustness_testing.md.
Real Subset Bridge
The real public benchmark path now uses:
colbert-aifor ColBERTv2 token embeddings on CPU- Mojo for packing, exact search, and evaluation
- small
BEIR/SciFactandBEIR/FIQAsubsets as real benchmark slices - an official
LIMIT-smallslice with the full46-document corpus and a light32-query subset - light
BrowseComp-Plusevidence and gold slices built from official decrypted queries, human-verified evidence documents, gold answer documents, and the benchmark's curated hard negatives - a query-diagnostic benchmark that scores the same BrowseComp ranking against both evidence and gold qrels
- repo-local storage so repeated runs can reload encoded tasks and packed indexes
This is still a deliberate smoke-oriented public suite, not a claim of full benchmark reproduction.
LoTTE remains a target, but the straightforward official loader path currently pulls a 3.58 GB archive, which is too heavy for the fast smoke workflow this repo wants.
BrowseComp-Plus is also intentionally sliced:
- the official benchmark uses a fixed corpus of about
100kdocuments and an agent loop - the repo keeps the retrieval core honest by using the benchmark's evidence docs, gold docs, and hard negatives
- the repo does not yet claim full agent-benchmark reproduction inside
kayak
The persisted artifacts live under:
.cache/kayak/scifact_real_subset/.cache/kayak/fiqa_real_subset/.cache/kayak/limit_small_real_subset/.cache/kayak/browsecomp_plus_real_subset/
For the BrowseComp-Plus slices, the raw Python tasks are also materialized once at:
.cache/kayak/browsecomp_plus_real_subset/python_task_evidence.json.cache/kayak/browsecomp_plus_real_subset/python_task_gold.json
The manifest records:
- storage format version
- vector scalar type
- dataset id
- model name
- judged task payload
- packed index payload
Storage format v2 keeps manifests and lightweight metadata in text, but stores
the hot vector payloads in binary little-endian form.
That is a deliberate compromise:
- metadata stays easy to inspect by eye
- vector payloads stop paying TSV parse and size overhead on every reload
- legacy
v1text payloads still load for compatibility
The repo also supports an optional persisted hybrid_flat_dim128_index artifact.
This is a derived layout for 128-dim document embeddings:
- it keeps
doc_idsanddoc_offsets - it stores document token values as one flat scalar buffer
- it is opt-in, not the default exact-search path
For the same late-interaction path, the repo also supports an optional
FlatQueryDim128 query layout.
This still preserves the full multi-vector query representation.
It changes query memory layout only; it does not collapse retrieval into a
single-vector search.
That choice is deliberate. The current measurements support keeping it as a first-class optional artifact, but they do not yet support silently replacing the default CPU search path.
Verifier Stage
The repo now includes a narrow third-stage verifier interface:
no_verifier: preserves the current exact-search pathexact_late_interaction_verifier(candidate_k): reranks a candidate window in Mojo with exact MaxSim
This stage is intentionally vector-only in v0.1.
That is an epistemic boundary, not a missing buzzword: the current stored artifacts contain token embeddings and ids, but not the raw text needed for an honest cross-encoder reranker.
If we want a text-level verifier later, the storage layer must first persist the necessary text payload explicitly.
There is now a narrow BrowseComp clause-text reranker prototype for benchmarking. It loads document text from the already-materialized BrowseComp JSON task cache instead of pretending the generic verifier pipeline has text available. That is deliberate:
- the prototype measures whether text-aware reranking can recover answer-bearing docs already present in the candidate window
- it is not yet the generic default verifier path
Commands
Run the demo:
pixi run demo
pixi run demo_scifact
pixi run demo_fiqa
pixi run demo_python_sdk
pixi run demo_python_sdk_mojo
Run tests:
pixi run test_index
pixi run test_maxsim
pixi run test_eval
pixi run test_proxies
pixi run test_python_bridge
pixi run test_python_api
pixi run test_storage
pixi run test_storage_compat
pixi run test_storage_invariants
pixi run test_score_partitions
pixi run test_hybrid_flat_dim128
pixi run test_verifier
pixi run test_battle
pixi run test_eval_battle
Install the Python package from a source checkout:
python -m pip install .
Run the exact CPU benchmark:
pixi run bench_exact
pixi run bench_profile_exact
pixi run bench_profile_cpu_micro
pixi run bench_profile_cpu_structural
pixi run bench_profile_cpu_structural_real_subset
pixi run bench_profile_cpu_hybrid_real_subset
pixi run bench_profile_cpu_verifier_real_subset
pixi run bench_profile_storage_real_subset
pixi run bench_profile_cpu_configs
pixi run bench_profile_cpu_usl
pixi run fit_usl
For lower-noise comparisons on a busy machine, use the quiet benchmark wrapper:
bash scripts/run_bench_quiet.sh --repeats 3 --max-other-cpu 40 -- pixi run bench_scifact
The default pixi run bench_* tasks for performance-sensitive benchmarks now
use this quiet wrapper automatically. Use the corresponding *_raw tasks only
for quick smoke checks when you explicitly do not want quiet-run gating.
Run the workload matrix and the proxy evaluation matrix:
pixi run bench_matrix
pixi run eval_matrix
pixi run bench_scifact
pixi run bench_fiqa
pixi run bench_limit_small
pixi run bench_browsecomp_plus
pixi run bench_browsecomp_plus_gold
pixi run bench_browsecomp_plus_diag
pixi run bench_browsecomp_plus_ranks
pixi run bench_browsecomp_plus_clause
pixi run bench_real_subset_policies
pixi run bench_real_subset_breakdown
Materialize the BrowseComp-Plus task json explicitly if you want to separate the plain-Python encoding step from the Mojo benchmark run:
pixi run build_browsecomp_plus_task_json
That command now materializes both BrowseComp-Plus retrieval variants:
- evidence qrels
- gold qrels
Run the curated mutation-smoke check:
pixi run mutate_smoke
Compile the package:
pixi run package_mojo
The compiled Mojo package is written to dist/kayak.mojopkg.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kayak-0.1.2.tar.gz.
File metadata
- Download URL: kayak-0.1.2.tar.gz
- Upload date:
- Size: 57.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a61211eb81995b9a69493fa4418fa28d46150e90850e5d65aaaa19f819e76208
|
|
| MD5 |
cd9a98387f3f868bfcc395e4ab9ede24
|
|
| BLAKE2b-256 |
77d00affd25881349288824cd43cbdaa93d2e43efa34f91643e9c3ad3a5040f4
|
File details
Details for the file kayak-0.1.2-py3-none-any.whl.
File metadata
- Download URL: kayak-0.1.2-py3-none-any.whl
- Upload date:
- Size: 2.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8c918851eb194a81ae34fb90c793794285afef302110829726c051cdbdfe539
|
|
| MD5 |
7a5f4ae4f69798f057055984b8396d27
|
|
| BLAKE2b-256 |
a9d19391c8cbb453ac77af204d994fbbd4ec549be79c07fbb94620eb77b49f2b
|