Skip to main content

State-of-the-art index for late-interaction multivector retrieval

Project description

TACHIOM

TACHIOM is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.

Installation

Python

Quick start (prebuilt wheels)

For most users, this is the easiest option:

pip install tachiom

If a compatible wheel exists for your platform, pip will download and install it directly without compilation. If no compatible wheel exists, pip will automatically compile from source.

Building from source (maximum performance)

For maximum performance optimized to your CPU, build from source.

Shared prerequisites — both approaches below require Rust nightly:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install nightly
rustup default nightly

Approach 1 — compile from PyPI source:

RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: tachiom

Approach 2 — build from GitHub (development/editable mode):

git clone https://github.com/TusKANNy/tachiom.git
cd tachiom

Create a virtual environment (recommended):

python3 -m venv ./venv
source ./venv/bin/activate  # On Windows: venv\Scripts\activate

Or with conda:

conda create -n tachiom python=3.11
conda activate tachiom

Install maturin and build:

pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release

Changes to Python code take effect immediately without reinstalling — ideal for development.

Rust

To compile all the Rust binaries in src/bin/:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.

Quick start

import tachiom

# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
#   vectors.npy    — [N, dim]   f16  one row per token
#   token_ids.npy  — [N]        i64  vocabulary id of each token
#   doclens.npy    — [n_docs]   i32  number of tokens per document

index = tachiom.Tachiom.build(
    "vectors.npy",
    "token_ids.npy",
    "doclens.npy",
    total_centroids=2_097_152,
)
index.save("my_index.bin")

# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")

# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]

See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.

Datasets

Pre-processed datasets and pre-built indexes are available on HuggingFace, ready to use with the experiment configs in experiments/sigir2026/.

Dataset HuggingFace Index
MS MARCO-v1 (ColBERT v2) tuskanny/ms_marco_colbertv2 tachiom_msmarco_4M_normalized
LoTTE Pooled (ColBERT v2) tuskanny/lotte_pooled_colbertv2 tachiom_lotte_2M_normalized

Each dataset contains documents.npy, token_ids.npy, doclens.npy, queries.npy, doc_ids.npy, queries_ids.npy, a qrels .tsv file, and a pre-built Tachiom index. Download with:

pip install huggingface_hub
huggingface-cli download tuskanny/ms_marco_colbertv2 --repo-type dataset --local-dir ./ms_marco
huggingface-cli download tuskanny/lotte_pooled_colbertv2 --repo-type dataset --local-dir ./lotte

Resources

Document Description
Python API Tachiom and Tac classes, all parameters, search guide
Rust CLI bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction
Jupyter notebooks End-to-end demo on TAC and TACHIOM
Experiments TOML configs used for the SIGIR 2026 benchmarks

License

This software is released under the MIT License (see LICENSE).

Citation license

By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

Bibliography

This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing}, 
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tachiom-0.2.2.tar.gz (345.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tachiom-0.2.2-cp313-cp313-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

tachiom-0.2.2-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tachiom-0.2.2-cp313-cp313-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

tachiom-0.2.2-cp312-cp312-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

tachiom-0.2.2-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tachiom-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tachiom-0.2.2-cp311-cp311-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

tachiom-0.2.2-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tachiom-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

tachiom-0.2.2-cp310-cp310-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

tachiom-0.2.2-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tachiom-0.2.2-cp310-cp310-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file tachiom-0.2.2.tar.gz.

File metadata

  • Download URL: tachiom-0.2.2.tar.gz
  • Upload date:
  • Size: 345.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tachiom-0.2.2.tar.gz
Algorithm Hash digest
SHA256 1515591f66eecb3c373b4f7b36b7afa48736c30057601a2729c26e3ab99a5629
MD5 107e1715e6f05c1b6a0522d341c84ea4
BLAKE2b-256 38bcf331446b710da9db56a6bec4d54150246d67f9993458ef6579a396dc8cd9

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 a12d59bb7f2a14b59d2141c213eedfafecf74d25bf26d7579e8dc08f35ef5846
MD5 89b6bac1f818e513d4fdb547aff8f144
BLAKE2b-256 b7ef15e9c1ac0068466d8bfc7ed5587b94b56bef9fb8e6f71ffcc4b0ac4db235

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 349f345d0ea56fe800b0712e94ea2b6842fc35d164e4930150db8b518ce8fe99
MD5 cf1bf89b166536f26696b5453f9967b1
BLAKE2b-256 fe505c6f80077d4c2f9769c82663c58bcc01933788937fc570bcc0bf2ef70ca2

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6c0afcbb905721254650c884741b4440fbf87f55c95d65a2430b6ca149bec51c
MD5 0eb8e6521ca5e1354056c3c527ddef14
BLAKE2b-256 0567a8dae9cc9acca7ebc05936fc72e39882323c1f5531ea8c932bafe81d3d0e

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 688575fddbaed6c49399483a97ee5e4859558d2711ed51b216a1cfcefda02fbb
MD5 562cc5fb3decd2115eab162540232cc3
BLAKE2b-256 78d3d0a6984bac6d4414b4f5ac4c87493cf8ef3c2e37894936c178901405af76

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e8310018fd6f801d99e66642633fb27ed053c40c6652167672c9c0c2b3714ec6
MD5 ce099af483293e447a9dddc9a6b53f34
BLAKE2b-256 db462fb3611e49b06bb6523179b2f5a612c407d8d65c541e18a798e7c9cfc5b2

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3ba6d96c337d2f6493a1ebb2c6de8f9d014578c6a1707f68c341e7f609356f07
MD5 dae1121ee53bb75272b2b1b5f7dd8607
BLAKE2b-256 1d229e904dc2b9eb38db8c4afc3e42947de9f2bbab07519d95aa7752304484c5

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 b8460891bdb9606a29968a8657d17d41509649c470ad3dc083aeaf824e78bc00
MD5 055faf62843098a9e8260cec85d4d45d
BLAKE2b-256 18b2c2eea493ffb99d151c86d474d587deeb0c882a407c7f89e4c3195f5913ac

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e155697d043d45906f3ab00195364f36b08e4d75fb23742fdfce7eb1f6461209
MD5 84fd90f17286da3fd7364f5730f5d18b
BLAKE2b-256 a75ffe20a1e40b465c3f0a2edffce609e9df155044fb7acc0eadd2894ed4a10a

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3f3740f0532488d9a0727a92780922c1da4e25d5b454c1fe83c7c85fa3600b01
MD5 5335b604ebe255fef6c1b1440a232989
BLAKE2b-256 ee5e2a48022fa2e81790072958c3f52b9ada10c142c59f2385d5d14cc2c44a74

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 32f015067028fb6610f289500bbd79b584a0081cb8bc1a336fb5fb7095d2f80a
MD5 109575c73628be85a8350ab5fee6d772
BLAKE2b-256 61c8f81b2389b78efbdc972344c6dfa7d3e141b5ff91405d9cf05f25f7417f85

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cd3d23531a830a0ef526e15d556b4b392f4f8b044c545da79d659592eb809f8e
MD5 f9ba5c543486a4a9165aea05bbe6018e
BLAKE2b-256 998645407abe47c03c52cbdae4a8913258e48dee021158b7ebc8a9eaa1bae1eb

See more details on using hashes here.

File details

Details for the file tachiom-0.2.2-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.2-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4f58f89570c612cc4a4c53d03396204f87757a672df9f63cf794cd924bc144db
MD5 1620be38b4e6030cce707a7de9aaa7fd
BLAKE2b-256 e9465fdacadf7a25a09071bebad54ebda0c4b95d2efa87a1907f1d4c731b9725

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page