Skip to main content

IVF-PQ index for late-interaction multivector retrieval

Project description

Tachiom

Tachiom is a fast and scalable data structure for late-interaction multi-vector retrieval, written in Rust with Python bindings. It introduces Token-Aware Clustering (TAC), which distributes the coarse-centroid budget proportionally across token types, and a hierarchical Product Quantization scheme for efficient candidate reranking.

Installation

Python

Tachiom is a Rust library with Python bindings built via maturin.

Prerequisites

Install Rust via rustup:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Activate the nightly toolchain (required):

rustup install nightly
rustup default nightly

Build from source

  1. Clone the repository:
git clone git@github.com:TusKANNy/tachiom.git
cd tachiom
  1. Create a virtual environment (recommended):
python3 -m venv ./venv
source ./venv/bin/activate  # On Windows: venv\Scripts\activate

Or with conda:

conda create -n tachiom python=3.11
conda activate tachiom
  1. Install maturin:
pip install maturin
  1. Build and install in editable mode:
RUSTFLAGS="-C target-cpu=native" maturin develop --release

The target-cpu=native flag enables SIMD instructions optimized for your CPU and is strongly recommended for performance.

Rust

To compile all the Rust binaries in src/bin/:

RUSTFLAGS="-C target-cpu=native" cargo build --release

Details on how to use Tachiom's Rust CLI can be found in docs/RustUsage.md.

Quick start

import tachiom

# ── Build ─────────────────────────────────────────────────────────────────────
# Inputs (all .npy files):
#   vectors.npy    — [N, dim]   f16  one row per token
#   token_ids.npy  — [N]        i64  vocabulary id of each token
#   doclens.npy    — [n_docs]   i32  number of tokens per document

index = tachiom.Tachiom.build(
    "vectors.npy",
    "token_ids.npy",
    "doclens.npy",
    total_centroids=2_097_152,
)
index.save("my_index.bin")

# ── Load & search ─────────────────────────────────────────────────────────────
index = tachiom.Tachiom.load("my_index.bin")

# queries: [n_queries, n_tokens, dim] f32 array
scores, doc_ids = index.batch_search(queries, k=10, num_threads=0)
# scores, doc_ids: [n_queries, k]

See docs/PythonUsage.md for the full API, all build and search parameters, and the two-step TAC workflow.

Resources

Document Description
Python API Tachiom and Tac classes, all parameters, search guide
Rust CLI bench_tac, tachiom_build, tachiom_search binaries, experiment runner, SIGIR 2026 reproduction
Jupyter notebooks End-to-end demo on TAC and TACHIOM
Experiments TOML configs used for the SIGIR 2026 benchmarks

License

This software is released under the MIT License (see LICENSE).

Citation license

By downloading and using this software, you agree to cite the following paper in any material you produce where it was used to conduct a search or experimentation, whether it be a research paper, dissertation, article, poster, presentation, or documentation. By using this software, you have agreed to the citation license.

Bibliography

This paper has been accepted at SIGIR 2026. The full proceedings entry will be available after the conference.

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing}, 
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tachiom-0.1.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tachiom-0.1.0-cp313-cp313-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.39+ x86-64

tachiom-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

tachiom-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

tachiom-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.39+ x86-64

tachiom-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tachiom-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

tachiom-0.1.0-cp311-cp311-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.39+ x86-64

tachiom-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tachiom-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

tachiom-0.1.0-cp310-cp310-manylinux_2_39_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.39+ x86-64

tachiom-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tachiom-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 10.12+ x86-64

File details

Details for the file tachiom-0.1.0.tar.gz.

File metadata

  • Download URL: tachiom-0.1.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tachiom-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d16c5591dcff4238c80bba03aa2ba6a6ec0013f7c3de863bab94183235e7b313
MD5 55429c04faf217468d7db8ba8d077737
BLAKE2b-256 1932123bb70845a3bf2f7cf9ea945106ca71fcbc782bd248f24c4629b72c22a2

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp313-cp313-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp313-cp313-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 31ef54eb5041087556c912bdf421c3bdab7315b57144c213734cc2b0f64fcc3f
MD5 ca0e0747e862052327c920da46d79cc5
BLAKE2b-256 d08ff4db0b32d3a9ee449192df5e0f0d17c8c4d121e7cf9720962be5e1244ab6

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1a86e77858883f0d4569f1656327b35004b479c3572ebddefea39973cdf39352
MD5 28ea5cc6e24d828b703d78feefcb6fe4
BLAKE2b-256 a25c60a40bae3d7f416cddaf66457942976cb5b18fc6c3c9cb744534aeb095c4

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 53e1e54b3cdfab18f1cc761f96c3c02076fcd5ee2d78a099f80d08040db81a7c
MD5 7cb06882e53d3cbd6e108ce964ceacfc
BLAKE2b-256 14f6a4668752f0a07daa825e3985a13b9a0ff72c2bc584208ba5f722fe854723

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp312-cp312-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 736db89461825c3fd3d62d3b2b680f5458c830d215acdf656be7cb1958a81b7a
MD5 eabdf86f476f1cb2bfbf8f175ba25d9f
BLAKE2b-256 48b9dea44ad941916fdd3643959a7c3dcd593b3b7665f59113fe05e660eef388

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 914c6c66eb6f6dbfeca6a1f2b6a19f2c5c1f8bfa43be03a69beea58363083f48
MD5 ce752ff1cbeda7b5d716a2d7cf290b1a
BLAKE2b-256 cd9cc6f329b6c9ddb7087dc371e0fd9da5adbf11bd637941240f9ae9415efae6

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ee6124aa5b6af09f8a08aa64b42a78072227ee069cb2746ba10d75b0052a95a8
MD5 226fb591cf1410f7284d979f7a3626d0
BLAKE2b-256 7a6e4d98efe7246ebdb5d6849220e728fb1e21f44b7910ab928511abdc0f7e45

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp311-cp311-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp311-cp311-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 b83592d3ec1a55bb1a6f7fee4a7b93dead05a95a1bc0d25d5cdd97b68bf73b21
MD5 083b9c04903c67793235ec67c6905e8d
BLAKE2b-256 aa47bab7d9ef9b36f4620ca0cf157da49b515cc887cd03305052733b28001bc1

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 69036880872a96ff9e6003a0b7450fed68398304b583483ec2803fe14d5f094b
MD5 6c1e7e5c5d2b3432177a2c9106b7d892
BLAKE2b-256 59b42bf7d1cbe5f3a7db7ea151910ada586a40ac77f3fa31828294e8aa81fcd5

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6a0fd07e368a8d7099aff6fe98347b950c0cab971564949a9da8faa819ce5b87
MD5 a1fe4368103b15c0ae3b1eb3b96c412b
BLAKE2b-256 dde89c6ee7c832e1dc6ad228ca20b7d03df3c04212774ee8fa92b283ee8b370f

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp310-cp310-manylinux_2_39_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp310-cp310-manylinux_2_39_x86_64.whl
Algorithm Hash digest
SHA256 45d266c4e3ef8a585ed72b1bfbce9fe4d0d876ec60d713c5086f702da914659c
MD5 0de403748a30fcba29a84d0207e3dfa0
BLAKE2b-256 c821d16b4f46dab4018ee559f36593efde2b6076f1fc7ca228368372a785b3e9

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 06a0391b689f258be3ba1af044e5cf2d363b65f19a7507b8313faae2d532dbd8
MD5 7aa692202a4c3b6e50a5a6d214d57b99
BLAKE2b-256 77906e6c275b52289c00d20714ab6436e4f84d3fd89e01e304b195b7c7ffd1b0

See more details on using hashes here.

File details

Details for the file tachiom-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for tachiom-0.1.0-cp310-cp310-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b3923f3ebaf6c16b6e52610d6ebdbe92e4556056af85e96d2e146ae7b1ed7e4d
MD5 3be8544d9efc4109d41fef9b9c53b1e2
BLAKE2b-256 f575444a81d99626e67f573c915e094bd9c3f4fbadb7186e5c670e8d6183c4a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page