Skip to main content

State-of-the-art index for late-interaction multivector retrieval

Project description

TACHIOM

TACHIOM is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.

Installation

Python

Quick start (prebuilt wheels)

For most users, this is the easiest option:

pip install tachiom

If a compatible wheel exists for your platform, pip will download and install it directly without compilation. If no compatible wheel exists, pip will automatically compile from source.

Building from source (maximum performance)

For maximum performance optimized to your CPU, build from source.

Shared prerequisites — both approaches below require Rust nightly:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup install nightly
rustup default nightly

Approach 1 — compile from PyPI source:

RUSTFLAGS="-C target-cpu=native" pip install --no-binary :all: tachiom

Approach 2 — build from GitHub (development/editable mode):

git clone https://github.com/TusKANNy/tachiom.git
cd tachiom

Create a virtual environment (recommended):

python3 -m venv ./venv
source ./venv/bin/activate  # On Windows: venv\Scripts\activate

Or with conda:

conda create -n tachiom python=3.11
conda activate tachiom

Install maturin and build:

pip install maturin
RUSTFLAGS="-C target-cpu=native" maturin develop --release

Changes to Python code take effect immediately without reinstalling — ideal for development.

Rust

To compile all the Rust binaries in src/bin/:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.

Quick start

import tachiom

# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
#   vectors.npy    — [N, dim]   f16  one row per token
#   token_ids.npy  — [N]        i64  vocabulary id of each token
#   doclens.npy    — [n_docs]   i32  number of tokens per document

index = tachiom.Tachiom.build(
    "vectors.npy",
    "token_ids.npy",
    "doclens.npy",
    total_centroids=2_097_152,
)
index.save("my_index.bin")

# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")

# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]

See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.

Datasets

Pre-processed datasets and pre-built indexes are available on HuggingFace, ready to use with the experiment configs in experiments/sigir2026/.

Dataset HuggingFace Index
MS MARCO-v1 (ColBERT v2) tuskanny/ms_marco_colbertv2 tachiom_msmarco_4M_normalized
LoTTE Pooled (ColBERT v2) tuskanny/lotte_pooled_colbertv2 tachiom_lotte_2M_normalized

Each dataset contains documents.npy, token_ids.npy, doclens.npy, queries.npy, doc_ids.npy, queries_ids.npy, a qrels .tsv file, and a pre-built Tachiom index. Download with:

pip install huggingface_hub
huggingface-cli download tuskanny/ms_marco_colbertv2 --repo-type dataset --local-dir ./ms_marco
huggingface-cli download tuskanny/lotte_pooled_colbertv2 --repo-type dataset --local-dir ./lotte

Resources

Document Description
Python API Tachiom and Tac classes, all parameters, search guide
Rust CLI bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction
Jupyter notebooks End-to-end demo on TAC and TACHIOM
Experiments TOML configs used for the SIGIR 2026 benchmarks

License

This software is released under the MIT License (see LICENSE).

Citation license

By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

Bibliography

This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing}, 
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tachiom-0.2.3.tar.gz (347.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tachiom-0.2.3-cp313-cp313-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

tachiom-0.2.3-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tachiom-0.2.3-cp313-cp313-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

tachiom-0.2.3-cp312-cp312-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

tachiom-0.2.3-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tachiom-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tachiom-0.2.3-cp311-cp311-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

tachiom-0.2.3-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tachiom-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

tachiom-0.2.3-cp310-cp310-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

tachiom-0.2.3-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tachiom-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file tachiom-0.2.3.tar.gz.

File metadata

  • Download URL: tachiom-0.2.3.tar.gz
  • Upload date:
  • Size: 347.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tachiom-0.2.3.tar.gz
Algorithm Hash digest
SHA256 385a0459429c6e5ccaae4781d705252e2bc8f41df6e47fdac7585b76d937f109
MD5 1484ecac1c06b1c8a38ad9f2f95b0267
BLAKE2b-256 d6c2e274b2e4de82a192b9a1722e746143b8d49c5ad2765ee65064704ca58f6f

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 791780e66ebd2284bf9991cb538f17c469546a643629e9f3ae7fa65fc0cf40c9
MD5 2879546a52acb3f54714231e4f14edf0
BLAKE2b-256 f2db7dfc0d0567344928adde07f8e2280fea7d6460065c11fe0fa2f9ad0a07e3

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e68d04a211145041859d65e527f1d3cacce66007353f6599f4d101766ebb0147
MD5 abedb6525b521716584db00bbab137c4
BLAKE2b-256 1378f84d7dcd0071ad70801a5683b4bbe8e056b070f6034db4b9e01d1e852935

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3196e0fe4d4ddce9a47a6b5e1977fd631ad979f55296989e9f2695ad1c6fd1e1
MD5 f04d61b302b71a7dc51ceb95f2800dc7
BLAKE2b-256 5844fbec071dbc7ca6717ca6f5937010c762d3d5267ee7374ead7e453b78ba0d

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 154eaf32f7582d2bb02c5d3af2ebbe18c70d85950b1755909d61228bae12dfa0
MD5 66bdf36833cfacdf101d9a8fa38e56d5
BLAKE2b-256 bb0147bb30e31657f30e0e4d2d65d8c3f5b5684d98cc28045b536329bbe24fd7

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7de5955f83927c0f62f1712c2fbabbb49b5d90bbf570a0896550e48eeadd5166
MD5 5693fec51074454a04d04619cdb02224
BLAKE2b-256 032acc54bb75bcfab66a203e600559505b989ae3b26117e3201a76be9361448e

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a2bff3f3deda3e65937f9c83292fdd3e901616a6cf93c07fbe4d59fd1152ef60
MD5 793f91bc2fd1af10b27b18f6c3d44c5e
BLAKE2b-256 b2b95f4816fd33ecf2c38d35991ae752e6fe4828a22555de9f4c1525a0f1e68d

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 2ef864eb5f9673bd1e6ecc14d2dd8d6337ab22f812f6d004d3a5eb909e708f4a
MD5 d174237abc3a42a65825123ee642ef58
BLAKE2b-256 d9b0b8a8b0143b885ab8eee43f9257b5c38cc9505ee91eb4d60e16fa8955ccd9

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fad8986927be545c6578a9ba7e3a72f26753a3bf935cc09dda171df5ab64a8d3
MD5 492223c48c48a0b0ae30a90956a8a5f6
BLAKE2b-256 dab3e3c9bd2d028fa1ee59d5175fb0a327d577aaa42529bebbd1a8de5667a823

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6dae28611e61d886700369880fb862259c7854e30d00ca42d398bceed4f012a6
MD5 a8a9ac69758145b428da1a729448ef15
BLAKE2b-256 0459f297f7e88f7b9ad3fd7cfee48ce6f460587cb1d4b93b35524a9543ff9d91

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 24a8dc7510a6e52a9857d90b26c41ee467f8c0f20e53e86c2db8fc8a1f0cd5e3
MD5 3e8818e7314360b28d436fcd06a3c887
BLAKE2b-256 25d57a40bb5998fda2dcb1243618be01694876538d7d4c988e64b1f7e551fff5

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c7d1fe89719df4658144f07caafaaf3a745f36db45635e4611451554cbe4a335
MD5 081f7b383cbfa5b700f4a4c82cb4fac8
BLAKE2b-256 a1b68a0428a23eebd236f12d69aa9836dfad68324c1287e56d08928950020842

See more details on using hashes here.

File details

Details for the file tachiom-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.2.3-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f8b5bb2947f73a0d19232f211fee5c612f8c0443a1a1146212fd5f84f061256e
MD5 dc7221ecf26ccda70dd792bf5a9351e1
BLAKE2b-256 588437ff64ae5a5d6b2993012decce515518b8c18326605abf179d76f6fd626d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page