State-of-the-art index for late-interaction multivector retrieval
Project description
TACHIOM
TACHIOM is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.
Installation
Python
Quick start (prebuilt wheels)
For most users, this is the easiest option:
pip install tachiom
If a compatible wheel exists for your platform, pip will download and install it directly without compilation. If no compatible wheel exists, pip will automatically compile from source.
Building from source (maximum performance)
For maximum performance optimized to your CPU, build from source.
Shared prerequisites — both approaches below require Rust nightly:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install nightly
rustup default nightly
Approach 1 — compile from PyPI source:
RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: tachiom
Approach 2 — build from GitHub (development/editable mode):
git clone https://github.com/TusKANNy/tachiom.git
cd tachiom
Create a virtual environment (recommended):
python3 -m venv ./venv
source ./venv/bin/activate # On Windows: venv\Scripts\activate
Or with conda:
conda create -n tachiom python=3.11
conda activate tachiom
Install maturin and build:
pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release
Changes to Python code take effect immediately without reinstalling — ideal for development.
Rust
To compile all the Rust binaries in src/bin/:
RUSTFLAGS="-C target-cpu=native" cargo build --release
Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.
Quick start
import tachiom
# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
# vectors.npy — [N, dim] f16 one row per token
# token_ids.npy — [N] i64 vocabulary id of each token
# doclens.npy — [n_docs] i32 number of tokens per document
index = tachiom.Tachiom.build(
"vectors.npy",
"token_ids.npy",
"doclens.npy",
total_centroids=2_097_152,
)
index.save("my_index.bin")
# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")
# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]
See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.
Datasets
Pre-processed datasets and pre-built indexes are available on HuggingFace, ready to use with the experiment configs in experiments/sigir2026/.
| Dataset | HuggingFace | Index |
|---|---|---|
| MS MARCO-v1 (ColBERT v2) | tuskanny/ms_marco_colbertv2 | tachiom_msmarco_4M_normalized |
| LoTTE Pooled (ColBERT v2) | tuskanny/lotte_pooled_colbertv2 | tachiom_lotte_2M_normalized |
Each dataset contains documents.npy, token_ids.npy, doclens.npy, queries.npy, doc_ids.npy, queries_ids.npy, a qrels .tsv file, and a pre-built Tachiom index. Download with:
pip install huggingface_hub
huggingface-cli download tuskanny/ms_marco_colbertv2 --repo-type dataset --local-dir ./ms_marco
huggingface-cli download tuskanny/lotte_pooled_colbertv2 --repo-type dataset --local-dir ./lotte
Resources
| Document | Description |
|---|---|
| Python API | Tachiom and Tac classes, all parameters, search guide |
| Rust CLI | bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction |
| Jupyter notebooks | End-to-end demo on TAC and TACHIOM |
| Experiments | TOML configs used for the SIGIR 2026 benchmarks |
License
This software is released under the MIT License (see LICENSE).
Citation license
By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.
Bibliography
This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.
@misc{martinico2026efficientmultivectorretrievaltokenaware,
title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing},
author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
year={2026},
eprint={2604.28142},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2604.28142},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tachiom-0.2.1.tar.gz.
File metadata
- Download URL: tachiom-0.2.1.tar.gz
- Upload date:
- Size: 344.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a572af72adcb960507a67c06ac6e8f24217c5475e8aebf795cca5a8325039ba
|
|
| MD5 |
365f36c87a102ba80618a85ad7ed1b3d
|
|
| BLAKE2b-256 |
8cf7f5cf6997c465c4f2c8ca5298dcc3772a527cbea43fd1f1e7b1188717b8f2
|
File details
Details for the file tachiom-0.2.1-cp313-cp313-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp313-cp313-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.13, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8486732af4cdda54f5a07576517fb9ad9183dbf9740f566ea142537f94f57874
|
|
| MD5 |
aa1667613dcdd0a11babf1ad1db435f2
|
|
| BLAKE2b-256 |
e94150b82989265d661f0b6837717dbb6474d744f95dfb180a24c8a54e7ed9d5
|
File details
Details for the file tachiom-0.2.1-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c11cf3bb31d5cb4c941fa1214c6948a0ae2557d2b3eca893cc9fae61bad6fb2
|
|
| MD5 |
4e29f5cbb42e7a3759a3848a76b7612b
|
|
| BLAKE2b-256 |
336cb00f343ef0e9775758a0620517632df1e342c5a0c23da4c70a96075b2d9f
|
File details
Details for the file tachiom-0.2.1-cp313-cp313-macosx_10_12_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp313-cp313-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c813a972e60dd20cbad101cc98e40bf6b44113ca2b1bc01b732435272f5735e4
|
|
| MD5 |
75dc774e62bdb2218e85633f15bcf140
|
|
| BLAKE2b-256 |
376c22efe2244b9b03f22d0cfd055d13b24c63a2b2c6958e8023289f8db275a7
|
File details
Details for the file tachiom-0.2.1-cp312-cp312-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp312-cp312-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.12, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1097ab0064f3e4a731d6d32bc0cfa6c5a02683d1d95b1bc8b9d0b01f5a6c996
|
|
| MD5 |
838fd3f217e55e8f07d9f0dde1763405
|
|
| BLAKE2b-256 |
4670429f0c33d2bfe60606006ec1d12a831779947e51409ea359242d1dffa28d
|
File details
Details for the file tachiom-0.2.1-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24166acaaea6a0b24c7839e6181a6838b03165292ec45f34f2cae2decf72d472
|
|
| MD5 |
7fcbaa573e34cf6328c5834526f49f6c
|
|
| BLAKE2b-256 |
d71008fb06ac95a2f29ea613601f43bfbfbb735b964596278bf98f77c2dfc378
|
File details
Details for the file tachiom-0.2.1-cp312-cp312-macosx_10_12_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp312-cp312-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc1c5d7a8e25bf89ea7aece14100323610fca031a3b78bc32e4fd2fa4fb41172
|
|
| MD5 |
0e833f61c0b53b40f50538d7ed1f8374
|
|
| BLAKE2b-256 |
2f7225acec4b406510b1d59b23206f9fce7aeaf861d4e3024b7e87e82cbbb3fa
|
File details
Details for the file tachiom-0.2.1-cp311-cp311-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp311-cp311-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.11, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3551127eeb90a0acb32b789927417246b61cffc3dfa3b05aa8e5eb7f686f3c02
|
|
| MD5 |
0219cc2ae9209284c0b4175a113e1a85
|
|
| BLAKE2b-256 |
46fdcf87413bc4ac3540025f8690d10c75a1fd745e045ad5706df0b7b8783b46
|
File details
Details for the file tachiom-0.2.1-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edc2f25b2d95fe95400b9b02f727f8ec35c82e990bc97da3a2e150e0a36c453b
|
|
| MD5 |
609afe8d629ffd6e51ed04c68944eef3
|
|
| BLAKE2b-256 |
9cb9c6f030bd1e9ff8e2314f4ec8b37ce18786c1f9dbd765c1db216d81777b0c
|
File details
Details for the file tachiom-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fa4d78109e56460557d59cedb80d4babd42147034148222f162c1906bf28afb
|
|
| MD5 |
3e4e8e62c95527ebea81390f0e300a9f
|
|
| BLAKE2b-256 |
a6b6512f89fdb6c137af1c647c4d34047a0ee9d568ca1ffd5dee736e76744f55
|
File details
Details for the file tachiom-0.2.1-cp310-cp310-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp310-cp310-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.10, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9396ef2e945ed1ad2fd9b162fc8fe999aa95b39a79fa3d9bf904bea2b1fc05c3
|
|
| MD5 |
897980b7b945f794c249eeb4b7b2ff0a
|
|
| BLAKE2b-256 |
034b036e0e6a56f1f2081cb1af2edfa1f67b02515f2cc482509944083503f758
|
File details
Details for the file tachiom-0.2.1-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd38cf125f564da4b272d05a1c52208461833d441fbb8e539d60a725c37e0010
|
|
| MD5 |
362f3b4a4c445af41e47cd00b7d58ede
|
|
| BLAKE2b-256 |
506f37b466363933d1ab149df8ab17c9d032ccb565e9deff4db4f8ddaa3fd280
|
File details
Details for the file tachiom-0.2.1-cp310-cp310-macosx_10_12_x86_64.whl.
File metadata
- Download URL: tachiom-0.2.1-cp310-cp310-macosx_10_12_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7274b222a77102332a300b2d424a78f0c9f18c5e9491988505a3d31e199b996
|
|
| MD5 |
db34ae1f55d60944c58cb68f0cf92cf8
|
|
| BLAKE2b-256 |
eda6fada16f3abb432ae7cb01848253841e121916c7e412d4dcc97adfdfd17e8
|