Skip to main content

State-of-the-art index for late-interaction multivector retrieval

Project description

TACHIOM

TACHIOM is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.

Installation

Python

Quick start (prebuilt wheels)

For most users, this is the easiest option:

pip install tachiom

If a compatible wheel exists for your platform, pip will download and install it directly without compilation. If no compatible wheel exists, pip will automatically compile from source.

Building from source (maximum performance)

For maximum performance optimized to your CPU, build from source.

Shared prerequisites — both approaches below require Rust nightly:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install nightly
rustup default nightly

Approach 1 — compile from PyPI source:

RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: tachiom

Approach 2 — build from GitHub (development/editable mode):

git clone https://github.com/TusKANNy/tachiom.git
cd tachiom

Create a virtual environment (recommended):

python3 -m venv ./venv
source ./venv/bin/activate  # On Windows: venv\Scripts\activate

Or with conda:

conda create -n tachiom python=3.11
conda activate tachiom

Install maturin and build:

pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release

Changes to Python code take effect immediately without reinstalling — ideal for development.

Rust

To compile all the Rust binaries in src/bin/:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.

Quick start

import tachiom

# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
#   vectors.npy    — [N, dim]   f16  one row per token
#   token_ids.npy  — [N]        i64  vocabulary id of each token
#   doclens.npy    — [n_docs]   i32  number of tokens per document

index = tachiom.Tachiom.build(
    "vectors.npy",
    "token_ids.npy",
    "doclens.npy",
    total_centroids=2_097_152,
)
index.save("my_index.bin")

# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")

# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]

See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.

Datasets

Pre-processed datasets and pre-built indexes are available on HuggingFace, ready to use with the experiment configs in experiments/sigir2026/.

Dataset HuggingFace Index
MS MARCO-v1 (ColBERT v2) tuskanny/ms_marco_colbertv2 tachiom_msmarco_4M_normalized
LoTTE Pooled (ColBERT v2) tuskanny/lotte_pooled_colbertv2 tachiom_lotte_2M_normalized

Each dataset contains documents.npy, token_ids.npy, doclens.npy, queries.npy, doc_ids.npy, queries_ids.npy, a qrels .tsv file, and a pre-built Tachiom index. Download with:

pip install huggingface_hub
huggingface-cli download tuskanny/ms_marco_colbertv2 --repo-type dataset --local-dir ./ms_marco
huggingface-cli download tuskanny/lotte_pooled_colbertv2 --repo-type dataset --local-dir ./lotte

Resources

Document Description
Python API Tachiom and Tac classes, all parameters, search guide
Rust CLI bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction
Jupyter notebooks End-to-end demo on TAC and TACHIOM
Experiments TOML configs used for the SIGIR 2026 benchmarks

License

This software is released under the MIT License (see LICENSE).

Citation license

By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

Bibliography

This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing}, 
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tachiom-0.2.0.tar.gz (344.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tachiom-0.2.0-cp313-cp313-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

tachiom-0.2.0-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tachiom-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

tachiom-0.2.0-cp312-cp312-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

tachiom-0.2.0-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tachiom-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tachiom-0.2.0-cp311-cp311-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

tachiom-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tachiom-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

tachiom-0.2.0-cp310-cp310-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

tachiom-0.2.0-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tachiom-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file tachiom-0.2.0.tar.gz.

File metadata

  • Download URL: tachiom-0.2.0.tar.gz
  • Upload date:
  • Size: 344.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tachiom-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8c6371bcb2bb4f372e43c7678310a6e88503af8ed317ba9dbd195b2801116cf0
MD5 8e8e3059a62a732d2888fc59638cc645
BLAKE2b-256 cb81d76411e74bd720c49524f53d009b487f9c3f7602c0ad957354d80b52bfcd

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 b0cddb5a620a669ee06ec057f1acbadbd53b4ea5f9ef061561d7d2eaac380021
MD5 4511765cd2eb75293d29e6d57215c150
BLAKE2b-256 0da313d9e51482881b672c411f96cf58ca44acfc626bf23edc9d954177919870

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7af885894f030e368abcf3f7f9c528ac7f58c2617c34e3abfab9698bb6ef0a5f
MD5 94f449bc6e54d97fb75970a820b41de6
BLAKE2b-256 3ea787e3358d259dc015b138336cc71b5e92a24d43aee4e94887b7762f8c2e9b

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 07eb268b29e580aa3906c8f1f79ccaa1efeeb059c4c740212da2004795027c62
MD5 cf64474d6a16972a0f18ca3833226267
BLAKE2b-256 334d81a64a22617104796cdeec574902f80c82496bb37225f20177cacfa39805

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 0a79b29ebb92f1d96210eb5158a7f63ae3f53071c9d812a82a352a26315e0ed9
MD5 2f18c937a0eaa125af1efa18e5b0c8cf
BLAKE2b-256 c1ead72c6aaf88716d80b8212fa0618bb928602dc673b7ccf2b08f5f4e8b0e5b

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 df12e1ce60cb8854cc5320c8bdfe5c0dec18e9ac723b22e44b93eab7b87d5ecb
MD5 7ed2919153575337a6025ab51f7145b3
BLAKE2b-256 f7dd120f928634ede623965b6a056e7e2226d099dbb7921c447765abc7bb939d

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9865b65fb656742481fa257e9851f575aee0ca17fe2548707f6bfbb2ce5cbefc
MD5 473a1fafdb66832dc473a843d91b2230
BLAKE2b-256 d609f67daf28f7e4138ad492c91ddec934c0d4f81761dd9dfceca8dc93059205

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 68886e6beeeb1de6182c8a58a4e9f01da4a5e828c0ddc79d27da36d07af0123d
MD5 a4607483001814ecf44436239a289f77
BLAKE2b-256 c32579bb00b4eba88df4e303ba5a6cd2e18f378bdac3c3f1d45194a0ef8bb8de

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b6fcc3ef42b68722fbd48b00a9ab14d12869848ae0d86e853c123730ce75df33
MD5 2da20606440e2d7ebc2cd865a8a0159f
BLAKE2b-256 8813c30e23cf1788b63e15a098f292d9d1dfdf800134f04452e7b06c8a34877c

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 01b79820ae1c0fdc062d00b8c16ab953b30b86b36413b0c0f93d309fe391928d
MD5 37afc5ceab7fda8eefad48da989f6c69
BLAKE2b-256 23c0f1b8e933ef7e2887855a45959f9055706fdc0cbb9f674abcee2e246deaed

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 e39a9a8f25917a9646ffce0f992231baed17e881679bf4e48991bc3d6a9dc71c
MD5 dfe5fff5d5a12d64f49cc0e93cb056d4
BLAKE2b-256 9a29536bb3856bca750100760c7056a87dcc9e657c72832eb0257069794c1ae1

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 78ffdd530bf3bc133a0b65643157a95bc5af07fcd34aeeacf250be927eaae2b3
MD5 795463f69be92f4b744780594b38f7a0
BLAKE2b-256 9168e19c74a19f1fdf58b3dc3248930a02fa2c47f7643ae9ba5ac3afe1bfe806

See more details on using hashes here.

File details

Details for the file tachiom-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9ac663cb9426b554424a42d6bc856b04a637eabefc439ee4029ddc0ff9569316
MD5 9720bf84ab658df464d50d195f89c797
BLAKE2b-256 d113c88daef8424c6ad9ca54f7bd8c15d77485d0117a4c6f8a0107c5c4190261

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page