Skip to main content

State-of-the-art index for late-interaction multivector retrieval

Project description

TACHIOM

TACHIOM is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.

Installation

Python

Quick start (prebuilt wheels)

For most users, this is the easiest option:

pip install tachiom

If a compatible wheel exists for your platform, pip will download and install it directly without compilation. If no compatible wheel exists, pip will automatically compile from source.

Building from source (maximum performance)

For maximum performance optimized to your CPU, build from source.

Shared prerequisites — both approaches below require Rust nightly:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install nightly
rustup default nightly

Approach 1 — compile from PyPI source:

RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: tachiom

Approach 2 — build from GitHub (development/editable mode):

git clone https://github.com/TusKANNy/tachiom.git
cd tachiom

Create a virtual environment (recommended):

python3 -m venv ./venv
source ./venv/bin/activate  # On Windows: venv\Scripts\activate

Or with conda:

conda create -n tachiom python=3.11
conda activate tachiom

Install maturin and build:

pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release

Changes to Python code take effect immediately without reinstalling — ideal for development.

Rust

To compile all the Rust binaries in src/bin/:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.

Quick start

import tachiom

# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
#   vectors.npy    — [N, dim]   f16  one row per token
#   token_ids.npy  — [N]        i64  vocabulary id of each token
#   doclens.npy    — [n_docs]   i32  number of tokens per document

index = tachiom.Tachiom.build(
    "vectors.npy",
    "token_ids.npy",
    "doclens.npy",
    total_centroids=2_097_152,
)
index.save("my_index.bin")

# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")

# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]

See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.

Datasets

Pre-processed datasets and pre-built indexes are available on HuggingFace, ready to use with the experiment configs in experiments/sigir2026/.

Dataset HuggingFace Index
MS MARCO-v1 (ColBERT v2) tuskanny/ms_marco_colbertv2 tachiom_msmarco_4M_normalized
LoTTE Pooled (ColBERT v2) tuskanny/lotte_pooled_colbertv2 tachiom_lotte_2M_normalized

Each dataset contains documents.npy, token_ids.npy, doclens.npy, queries.npy, doc_ids.npy, queries_ids.npy, a qrels .tsv file, and a pre-built Tachiom index. Download with:

pip install huggingface_hub
huggingface-cli download tuskanny/ms_marco_colbertv2 --repo-type dataset --local-dir ./ms_marco
huggingface-cli download tuskanny/lotte_pooled_colbertv2 --repo-type dataset --local-dir ./lotte

Resources

Document Description
Python API Tachiom and Tac classes, all parameters, search guide
Rust CLI bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction
Jupyter notebooks End-to-end demo on TAC and TACHIOM
Experiments TOML configs used for the SIGIR 2026 benchmarks

License

This software is released under the MIT License (see LICENSE).

Citation license

By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

Bibliography

This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing}, 
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tachiom-0.2.1.tar.gz (344.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tachiom-0.2.1-cp313-cp313-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

tachiom-0.2.1-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tachiom-0.2.1-cp313-cp313-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

tachiom-0.2.1-cp312-cp312-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

tachiom-0.2.1-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tachiom-0.2.1-cp312-cp312-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tachiom-0.2.1-cp311-cp311-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

tachiom-0.2.1-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tachiom-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

tachiom-0.2.1-cp310-cp310-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

tachiom-0.2.1-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tachiom-0.2.1-cp310-cp310-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file tachiom-0.2.1.tar.gz.

File metadata

  • Download URL: tachiom-0.2.1.tar.gz
  • Upload date:
  • Size: 344.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tachiom-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6a572af72adcb960507a67c06ac6e8f24217c5475e8aebf795cca5a8325039ba
MD5 365f36c87a102ba80618a85ad7ed1b3d
BLAKE2b-256 8cf7f5cf6997c465c4f2c8ca5298dcc3772a527cbea43fd1f1e7b1188717b8f2

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 8486732af4cdda54f5a07576517fb9ad9183dbf9740f566ea142537f94f57874
MD5 aa1667613dcdd0a11babf1ad1db435f2
BLAKE2b-256 e94150b82989265d661f0b6837717dbb6474d744f95dfb180a24c8a54e7ed9d5

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6c11cf3bb31d5cb4c941fa1214c6948a0ae2557d2b3eca893cc9fae61bad6fb2
MD5 4e29f5cbb42e7a3759a3848a76b7612b
BLAKE2b-256 336cb00f343ef0e9775758a0620517632df1e342c5a0c23da4c70a96075b2d9f

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 c813a972e60dd20cbad101cc98e40bf6b44113ca2b1bc01b732435272f5735e4
MD5 75dc774e62bdb2218e85633f15bcf140
BLAKE2b-256 376c22efe2244b9b03f22d0cfd055d13b24c63a2b2c6958e8023289f8db275a7

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 f1097ab0064f3e4a731d6d32bc0cfa6c5a02683d1d95b1bc8b9d0b01f5a6c996
MD5 838fd3f217e55e8f07d9f0dde1763405
BLAKE2b-256 4670429f0c33d2bfe60606006ec1d12a831779947e51409ea359242d1dffa28d

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 24166acaaea6a0b24c7839e6181a6838b03165292ec45f34f2cae2decf72d472
MD5 7fcbaa573e34cf6328c5834526f49f6c
BLAKE2b-256 d71008fb06ac95a2f29ea613601f43bfbfbb735b964596278bf98f77c2dfc378

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 dc1c5d7a8e25bf89ea7aece14100323610fca031a3b78bc32e4fd2fa4fb41172
MD5 0e833f61c0b53b40f50538d7ed1f8374
BLAKE2b-256 2f7225acec4b406510b1d59b23206f9fce7aeaf861d4e3024b7e87e82cbbb3fa

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 3551127eeb90a0acb32b789927417246b61cffc3dfa3b05aa8e5eb7f686f3c02
MD5 0219cc2ae9209284c0b4175a113e1a85
BLAKE2b-256 46fdcf87413bc4ac3540025f8690d10c75a1fd745e045ad5706df0b7b8783b46

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 edc2f25b2d95fe95400b9b02f727f8ec35c82e990bc97da3a2e150e0a36c453b
MD5 609afe8d629ffd6e51ed04c68944eef3
BLAKE2b-256 9cb9c6f030bd1e9ff8e2314f4ec8b37ce18786c1f9dbd765c1db216d81777b0c

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6fa4d78109e56460557d59cedb80d4babd42147034148222f162c1906bf28afb
MD5 3e4e8e62c95527ebea81390f0e300a9f
BLAKE2b-256 a6b6512f89fdb6c137af1c647c4d34047a0ee9d568ca1ffd5dee736e76744f55

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 9396ef2e945ed1ad2fd9b162fc8fe999aa95b39a79fa3d9bf904bea2b1fc05c3
MD5 897980b7b945f794c249eeb4b7b2ff0a
BLAKE2b-256 034b036e0e6a56f1f2081cb1af2edfa1f67b02515f2cc482509944083503f758

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cd38cf125f564da4b272d05a1c52208461833d441fbb8e539d60a725c37e0010
MD5 362f3b4a4c445af41e47cd00b7d58ede
BLAKE2b-256 506f37b466363933d1ab149df8ab17c9d032ccb565e9deff4db4f8ddaa3fd280

See more details on using hashes here.

File details

Details for the file tachiom-0.2.1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f7274b222a77102332a300b2d424a78f0c9f18c5e9491988505a3d31e199b996
MD5 db34ae1f55d60944c58cb68f0cf92cf8
BLAKE2b-256 eda6fada16f3abb432ae7cb01848253841e121916c7e412d4dcc97adfdfd17e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page