Skip to main content

Mojo-first late-interaction retrieval engine

Project description

kayak

kayak is a Mojo-first late-interaction retrieval engine.

For developers using the Python SDK, the supported entrypoint is import kayak. The current monorepo keeps the Python SDK and Mojo engine together on purpose. The documented public SDK boundary is narrower than the full repo surface.

The current scaffold is intentionally narrow:

  • encoder output is treated as an external boundary
  • indexing and exact MaxSim scoring live in Mojo
  • CPU exact search is the first verified path
  • benchmarks and tests are first-class, not an afterthought

Current Layout

  • kayak/contracts/: validated query/document contracts
  • kayak/numeric/: centralized scalar aliases and storage-format constants
  • kayak/index/: packed index layout and optional flat dim128 document layouts
  • kayak/scoring/: exact MaxSim scoring kernels
  • kayak/runtime/: backend boundary, CPU backend first
  • kayak/search/: top-k search orchestration
  • kayak/verifier/: optional candidate-window reranking and verifier pipeline
  • kayak/benchmarks/: deterministic workload profiles and proxy benchmark tasks
  • kayak/eval/: judged tasks and lightweight retrieval metrics
  • kayak/interop/: Python bridge for external encoders and real public subsets
  • kayak/storage/: persisted judged tasks, packed indexes, and optional derived layouts
  • benchmarks/: runnable benchmark entrypoints
  • python/: Python bridge modules, explicit late-interaction objects, and a reference exact backend
  • tests/: runnable unit-test entrypoints using std.testing.TestSuite

Why This Shape

This layout is justified by the current project goal:

  • keep the encoder boundary swappable for MAX or another transformer stack
  • keep the retrieval core in Mojo
  • keep scalar choices centralized so vector/score dtypes can evolve without a rewrite
  • keep hot paths isolated so they are easy to profile and later replace with GPU kernels

The code keeps vector counts explicit because search quality and systems cost both depend on:

  • query vector count
  • document vector count
  • related sparse-attention or pruning budgets

Python Late-Interaction Layer

The repo now includes an additive Python-facing late-interaction layer under python/kayak_bridge/ plus a light python/kayak/ facade for import kayak. The packaging config points Python packaging at the existing python/ tree instead of introducing a second src/kayak tree on top of the repo's Mojo package layout.

What it owns:

  • explicit LateQuery, LateDocuments, LateIndex, and LateScores objects
  • explicit layout conversions for flat_dim128 queries and hybrid_flat_dim128 indexes
  • exact maxsim and search operations over those objects
  • NumPy and PyTorch input ergonomics without pretending the data is a dense B x T x D tensor problem
  • a pip-installable Python package surface rooted at import kayak

What it does not claim yet:

  • hidden approximation, implicit layout conversion, or overloaded tensor algebra
  • a published wheel that bundles the Mojo toolchain itself

Current backend boundary:

  • numpy_reference: correctness-oriented NumPy reference path
  • mojo_exact_cpu: Mojo-backed exact CPU scoring through a compiled Python extension module

Current packaging boundary:

  • pip install . from a source checkout is verified
  • when Mojo is available at build time, the wheel bundles kayak.mojopkg so the installed package can build the Python extension on demand
  • the current package still expects a local Mojo toolchain at runtime for mojo_exact_cpu
  • fresh-consumer testing verified numpy_reference through a local Pixi package add
  • fresh-consumer testing verified mojo_exact_cpu through pip install /path/to/kayak with mojo present during install
  • fresh-consumer testing did not verify mojo_exact_cpu through pixi add --pypi "kayak @ file://..." because that path did not bundle kayak.mojopkg

Supported public Python boundary:

  • import from kayak
  • treat kayak_bridge as internal and unstable
  • treat the top-level Mojo package kayak/ as engine code, not as the Python SDK

The detailed SDK boundary, install paths, and quickstarts are documented in docs/python_sdk.md.

Example:

import kayak

q = kayak.query(query_vectors)
docs = kayak.documents(["doc-a", "doc-b"], document_vectors)
index = docs.pack().to_layout("hybrid_flat_dim128")

scores = kayak.maxsim(q, docs.pack(), backend=kayak.MOJO_EXACT_CPU_BACKEND)
hits = kayak.search(
    q.to_layout("flat_dim128"),
    index,
    k=10,
    backend=kayak.NUMPY_REFERENCE_BACKEND,
)

Benchmark Coverage

The benchmark layer now has two complementary pieces:

  • workload profiles for system timings across benchmark families
  • tiny judged proxy tasks for fast retrieval-quality checks

The included slices are inspired by public benchmark families that are relevant to late interaction:

  • LoTTE: domain-specific forum retrieval in the ColBERT ecosystem
  • BEIR: heterogeneous factual retrieval across domains
  • MS MARCO: short passage retrieval
  • BRIGHT: reasoning-heavy retrieval
  • MIRACL: multilingual retrieval

Important epistemic boundary:

  • these shipped tasks are proxies, not official benchmark reproductions
  • they are intended to keep the code runnable, fast, and easy to scale later
  • official full-benchmark claims still require running the public datasets and their evaluation protocols

The source rationale for those families is recorded in docs/benchmark_rationale.md.

Profiling Benchmarks

The repo now has profiling-oriented microbenchmarks that separate:

  • raw dot_product
  • per-document exact MaxSim
  • backend score_all
  • full search_exact

These benches sweep explicit shapes so vector count stays first-class in the output. The current CPU path uses a SIMD dot_product kernel, a narrow 128-dim fast path for ColBERT-shaped embeddings, and vector-balanced document partitioning for larger exact-search workloads. Both CPU optimizations are now explicitly configurable through ExactScoringConfig, so you can disable the 128-dim fast path or document-level parallel scoring when profiling or comparing kernels. Parallel work-item oversubscription can also be disabled explicitly when you want a strict worker_count partitioning policy, and the work-item count can be overridden directly for scalability sweeps and USL fitting.

Robustness Layer

The repo now includes a small robustness layer inspired by property-first testing and mutation-quality checks:

  • tests/test_battle.mojo: randomized differential and metamorphic checks for the late-interaction core
  • tests/test_storage_invariants.mojo: corruption and compatibility checks for persisted artifacts
  • tests/test_score_partitions.mojo: vector-balanced partitioning checks for parallel exact scoring
  • tests/test_eval_battle.mojo: metric reference and evaluation invariants
  • python/scripts/mutation_smoke.py: curated mutation-smoke harness for core kernels, metrics, and storage guards

This is documented in docs/robustness_testing.md.

Real Subset Bridge

The real public benchmark path now uses:

  • colbert-ai for ColBERTv2 token embeddings on CPU
  • Mojo for packing, exact search, and evaluation
  • small BEIR/SciFact and BEIR/FIQA subsets as real benchmark slices
  • an official LIMIT-small slice with the full 46-document corpus and a light 32-query subset
  • light BrowseComp-Plus evidence and gold slices built from official decrypted queries, human-verified evidence documents, gold answer documents, and the benchmark's curated hard negatives
  • a query-diagnostic benchmark that scores the same BrowseComp ranking against both evidence and gold qrels
  • repo-local storage so repeated runs can reload encoded tasks and packed indexes

This is still a deliberate smoke-oriented public suite, not a claim of full benchmark reproduction. LoTTE remains a target, but the straightforward official loader path currently pulls a 3.58 GB archive, which is too heavy for the fast smoke workflow this repo wants. BrowseComp-Plus is also intentionally sliced:

  • the official benchmark uses a fixed corpus of about 100k documents and an agent loop
  • the repo keeps the retrieval core honest by using the benchmark's evidence docs, gold docs, and hard negatives
  • the repo does not yet claim full agent-benchmark reproduction inside kayak

The persisted artifacts live under:

  • .cache/kayak/scifact_real_subset/
  • .cache/kayak/fiqa_real_subset/
  • .cache/kayak/limit_small_real_subset/
  • .cache/kayak/browsecomp_plus_real_subset/

For the BrowseComp-Plus slices, the raw Python tasks are also materialized once at:

  • .cache/kayak/browsecomp_plus_real_subset/python_task_evidence.json
  • .cache/kayak/browsecomp_plus_real_subset/python_task_gold.json

The manifest records:

  • storage format version
  • vector scalar type
  • dataset id
  • model name
  • judged task payload
  • packed index payload

Storage format v2 keeps manifests and lightweight metadata in text, but stores the hot vector payloads in binary little-endian form. That is a deliberate compromise:

  • metadata stays easy to inspect by eye
  • vector payloads stop paying TSV parse and size overhead on every reload
  • legacy v1 text payloads still load for compatibility

The repo also supports an optional persisted hybrid_flat_dim128_index artifact. This is a derived layout for 128-dim document embeddings:

  • it keeps doc_ids and doc_offsets
  • it stores document token values as one flat scalar buffer
  • it is opt-in, not the default exact-search path

For the same late-interaction path, the repo also supports an optional FlatQueryDim128 query layout. This still preserves the full multi-vector query representation. It changes query memory layout only; it does not collapse retrieval into a single-vector search.

That choice is deliberate. The current measurements support keeping it as a first-class optional artifact, but they do not yet support silently replacing the default CPU search path.

Verifier Stage

The repo now includes a narrow third-stage verifier interface:

  • no_verifier: preserves the current exact-search path
  • exact_late_interaction_verifier(candidate_k): reranks a candidate window in Mojo with exact MaxSim

This stage is intentionally vector-only in v0.1. That is an epistemic boundary, not a missing buzzword: the current stored artifacts contain token embeddings and ids, but not the raw text needed for an honest cross-encoder reranker. If we want a text-level verifier later, the storage layer must first persist the necessary text payload explicitly.

Commands

Run the demo:

pixi run demo
pixi run demo_scifact
pixi run demo_fiqa
pixi run demo_python_sdk
pixi run demo_python_sdk_mojo

Run tests:

pixi run test_index
pixi run test_maxsim
pixi run test_eval
pixi run test_proxies
pixi run test_python_bridge
pixi run test_python_api
pixi run test_storage
pixi run test_storage_compat
pixi run test_storage_invariants
pixi run test_score_partitions
pixi run test_hybrid_flat_dim128
pixi run test_verifier
pixi run test_battle
pixi run test_eval_battle

Install the Python package from a source checkout:

python -m pip install .

Run the exact CPU benchmark:

pixi run bench_exact
pixi run bench_profile_exact
pixi run bench_profile_cpu_micro
pixi run bench_profile_cpu_structural
pixi run bench_profile_cpu_structural_real_subset
pixi run bench_profile_cpu_hybrid_real_subset
pixi run bench_profile_cpu_verifier_real_subset
pixi run bench_profile_storage_real_subset
pixi run bench_profile_cpu_configs
pixi run bench_profile_cpu_usl
pixi run fit_usl

For lower-noise comparisons on a busy machine, use the quiet benchmark wrapper:

bash scripts/run_bench_quiet.sh --repeats 3 --max-other-cpu 40 -- pixi run bench_scifact

The default pixi run bench_* tasks for performance-sensitive benchmarks now use this quiet wrapper automatically. Use the corresponding *_raw tasks only for quick smoke checks when you explicitly do not want quiet-run gating.

Run the workload matrix and the proxy evaluation matrix:

pixi run bench_matrix
pixi run eval_matrix
pixi run bench_scifact
pixi run bench_fiqa
pixi run bench_limit_small
pixi run bench_browsecomp_plus
pixi run bench_browsecomp_plus_gold
pixi run bench_browsecomp_plus_diag
pixi run bench_browsecomp_plus_ranks
pixi run bench_real_subset_policies
pixi run bench_real_subset_breakdown

Materialize the BrowseComp-Plus task json explicitly if you want to separate the plain-Python encoding step from the Mojo benchmark run:

pixi run build_browsecomp_plus_task_json

That command now materializes both BrowseComp-Plus retrieval variants:

  • evidence qrels
  • gold qrels

Run the curated mutation-smoke check:

pixi run mutate_smoke

Compile the package:

pixi run package_mojo

The compiled Mojo package is written to dist/kayak.mojopkg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kayak-0.1.1.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kayak-0.1.1-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file kayak-0.1.1.tar.gz.

File metadata

  • Download URL: kayak-0.1.1.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.16

File hashes

Hashes for kayak-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4a7cdc5e2dceec4738aff92f95fdbea14cd31481abf573b26e462752b5d77ce9
MD5 d538ce00484e3ee1f6337be8aa3e0dd2
BLAKE2b-256 a1063332904ea224767b8fd17934a1d1327332ba6e608ee85790b286902ab560

See more details on using hashes here.

File details

Details for the file kayak-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: kayak-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.16

File hashes

Hashes for kayak-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca250950bfd29e309bbd5f86930ae02c4e1d8e0d2ca3737a645cd04e67c4077d
MD5 1a8b310b786b6d36c6f429709a50d73f
BLAKE2b-256 8218bdaf7a7eaeb4aa4fbdc62991579c3a62eeae2287c186e128b2f42bb0ddc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page