Skip to main content

IVF-PQ index for late-interaction multivector retrieval

Project description

Tachiom

TACHIOM is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.

Installation

Python

Quick start (prebuilt wheels)

For most users, this is the easiest option:

pip install tachiom

If a compatible wheel exists for your platform, pip will download and install it directly without compilation. If no compatible wheel exists, pip will automatically compile from source.

Building from source (maximum performance)

For maximum performance optimized to your CPU, build from source.

Shared prerequisites — both approaches below require Rust nightly:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install nightly
rustup default nightly

Approach 1 — compile from PyPI source:

RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: tachiom

Approach 2 — build from GitHub (development/editable mode):

git clone https://github.com/TusKANNy/tachiom.git
cd tachiom

Create a virtual environment (recommended):

python3 -m venv ./venv
source ./venv/bin/activate  # On Windows: venv\Scripts\activate

Or with conda:

conda create -n tachiom python=3.11
conda activate tachiom

Install maturin and build:

pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release

Changes to Python code take effect immediately without reinstalling — ideal for development.

Rust

To compile all the Rust binaries in src/bin/:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.

Quick start

import tachiom

# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
#   vectors.npy    — [N, dim]   f16  one row per token
#   token_ids.npy  — [N]        i64  vocabulary id of each token
#   doclens.npy    — [n_docs]   i32  number of tokens per document

index = tachiom.Tachiom.build(
    "vectors.npy",
    "token_ids.npy",
    "doclens.npy",
    total_centroids=2_097_152,
)
index.save("my_index.bin")

# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")

# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]

See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.

Datasets

Pre-processed datasets and pre-built indexes are available on HuggingFace, ready to use with the experiment configs in experiments/sigir2026/.

Dataset HuggingFace Index
MS MARCO-v1 (ColBERT v2) tuskanny/ms_marco_colbertv2 tachiom_msmarco_4M_normalized
LoTTE Pooled (ColBERT v2) tuskanny/lotte_pooled_colbertv2 tachiom_lotte_2M_normalized

Each dataset contains documents.npy, token_ids.npy, doclens.npy, queries.npy, doc_ids.npy, queries_ids.npy, a qrels .tsv file, and a pre-built Tachiom index. Download with:

pip install huggingface_hub
huggingface-cli download tuskanny/ms_marco_colbertv2 --repo-type dataset --local-dir ./ms_marco
huggingface-cli download tuskanny/lotte_pooled_colbertv2 --repo-type dataset --local-dir ./lotte

Resources

Document Description
Python API Tachiom and Tac classes, all parameters, search guide
Rust CLI bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction
Jupyter notebooks End-to-end demo on TAC and TACHIOM
Experiments TOML configs used for the SIGIR 2026 benchmarks

License

This software is released under the MIT License (see LICENSE).

Citation license

By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

Bibliography

This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing}, 
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tachiom-0.1.1.tar.gz (6.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tachiom-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

tachiom-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tachiom-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

tachiom-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

tachiom-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tachiom-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tachiom-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

tachiom-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tachiom-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

tachiom-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

tachiom-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tachiom-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file tachiom-0.1.1.tar.gz.

File metadata

  • Download URL: tachiom-0.1.1.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tachiom-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f9146ae50163f46a7c03fcafbf7e49ae6f3d1ab029ee805d7356a1cbc31f8faf
MD5 c4839627002d98ce4072ca02fd90f8ab
BLAKE2b-256 9e766a4d773a7c403db4211c9fdeaa95c9bf70330b88debff91a45ff0ab6f610

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 b9d03677414f0ed9e92f66fe650dd15bbc6708a52e5bc0fef4ccd2d2fb48d1ae
MD5 62ba92315a23769152c40fac526bcd78
BLAKE2b-256 a88c7edafff586fe1c47a8ad6de00800e54c9a2368b7b4b32dcb37bfb979675c

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fae868abf44f584637848699fb237db394e352f4ce32e5c491d14c700a0d22f4
MD5 d243fcafd0ed1bd7712460811470db0d
BLAKE2b-256 e70db4a87c89798a90688e4addce2b0f24468e044cd31a192295ff2822c5c2b3

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d794e2482621b240b9e3f0f523e0b4da8f10f3f11c21b321478474c3c7232e1e
MD5 a457d09b7dfa6121341b8e7540e6d58b
BLAKE2b-256 381cf40f0152d2a97aa03e29d6b090eeb2655d31d911d8b58b3a483a91af203f

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 c6b83ce51b933f7950dc41b7017cb3a93f10b65e4334af215b305d0d6dca3024
MD5 b5baa9fc33b65464e94c406853e7d137
BLAKE2b-256 c3e110b7e1104eede669e239c17eb99233e44eeb6ddc9731b145e9ee82cc8f38

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 52917c353578fdf1dcba3ef8b85052f43df658d13073e07fcd2e079c21842596
MD5 47a8e2a2e864962aadc3bdc749a21642
BLAKE2b-256 2637e7f9d7fc1beb14d046fe5d14ed49b7dbfc9397a2e836f3411fa0fcfe5afd

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9e984b2e5726536e6f19aa183b6326a4cc68e22faab8f7ea337a661ca8c5ca1d
MD5 020c0f56c9a82cbc242f019b39e7b2c6
BLAKE2b-256 e2dd0b61b69addf45ad0c1a5934f11012015896a0d835137d053bbb387870ce0

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 22f6a24e3eec83e626ed992ef8e25371a6944f769177faf1dddc2d7ceb06283c
MD5 60b2de0b81e045b93b63b48716863af5
BLAKE2b-256 9d26f1ff2218cb1a65a5d19ca8d7a68dec4abff9180a1a4b3600ab497bce0edd

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fbf0fe771ae7077797cba96dbd1fa2938819aef0b9878455ff13b46de40ffb55
MD5 e5b4198e2c993af3bef595c74d8ada42
BLAKE2b-256 b1b9365c21877f2051bf8e21b62e7923c12676a78b12058e9cbadfa8a33be59e

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a2a920fea19343bcd85afc61c5b090c2287596ed76c30dcae412092afb067720
MD5 791f150bdf66af8db170308d9ac6e73f
BLAKE2b-256 5b8c574bc25a77d0f658a46701e54dd316d74674bb08fe69996dc720c35e61b1

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 9ad57f4af0d96d1cf3c683a68d9af2cb36cde9ad7d7311df4092902ba92da014
MD5 fe00545de8c3037af1c147d1daebe036
BLAKE2b-256 50ff8cb0784989f3388705c41385defc272d72ca7a3528ab11b335ff01c38f8c

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a5d673f739430d0271ced0eb7a184903ba09016b342f2f646b78d4c7248e6a94
MD5 804c39c1c37ac14fa4b2394452e8348d
BLAKE2b-256 0b9d6903383c2f62f449611c4a820580777c78351ced79e70c02ea65b219601e

See more details on using hashes here.

File details

Details for the file tachiom-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.1-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 80d5a30a79ca0ee7255f79ac8832df71e30e1eee6fa79d9066cfb6086c57c575
MD5 ca4d60d5d5184687e12f49704be57545
BLAKE2b-256 2bba5cde348298e8c0cc46af86f4872d2eb91a4f3508aefd65a36bba7be9d3f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page