Skip to main content

A lightweight, zero-PyTorch ONNX encoder for generic ColBERT models.

Project description

🕸️ intextus

License: MIT Python 3.8+

intextus (Latin for "woven into the text") is an ultra-lightweight, 100% PyTorch-free, and production-grade Python library designed to encode late-interaction ColBERT multi-vectors.

By replacing massive deep learning libraries with highly optimized, compiled C++/Rust backends, intextus delivers full ColBERT MaxSim embeddings in under 65MB of RAM with zero PyTorch or Transformers dependencies. It is optimized for edge devices, serverless functions (AWS Lambda, Cloudflare Workers), and resource-constrained environments.


⚡ Key Features

  • No PyTorch or Transformers: Fully decoupled from the heavy standard library pipeline. A simple pip install completes in seconds.
  • Micro Memory Footprint: Executes multi-vector graphs inside ONNX Runtime, drawing less than 65MB of RAM during inference.
  • Fast Rust Tokenization: Uses Hugging Face's raw Rust tokenization backend directly.
  • Dynamic Punctuation Skiplist: Dynamically parses tokenizer.json at initialization, creating a zero-overhead mask to discard punctuation vectors, matching ColBERT index-saving behaviors.
  • Standardized Late Interaction: Exposes native NumPy-based MaxSim calculations.

📦 Installation

Install the library directly via pip:

pip install intextus-embed

[!NOTE] intextus currently defaults to highly optimized CPU inference. Full hardware acceleration and GPU execution support are planned for a future release.


🚀 Quick Start

Here is how to load a model, extract multi-vector embeddings, and compute late-interaction cross-similarity scores entirely in NumPy:

from intextus import IntextusEncoder, compute_maxsim

# Initialize the encoder (defaults to intextus/mxbai-edge-colbert-v0-17m-onnx)
model = IntextusEncoder()

# Or initialize from a local directory containing 'model.onnx' and 'tokenizer.json'
# model = IntextusEncoder("./my_model_directory")

# Extract query and document embeddings (Batch_Size, Sequence_Length, Dimension)
query_embeddings = model.encode_queries("What is ultra-low latency?")
doc_embeddings = model.encode_docs("ONNX runtime bypasses the PyTorch layer completely.")

# Compute the cross-similarity score via NumPy (using the first item in the batch)
score = compute_maxsim(query_embeddings[0], doc_embeddings[0])
print(f"Relevance Score (MaxSim): {score:.4f}")

🎯 Supported & Tested Models

intextus is designed for ultra-fast, edge-compatible ColBERT execution. The primary officially supported and fully validated models are:

  • intextus/mxbai-edge-colbert-v0-17m-onnx (Alias: mxbai-edge-colbert-v0-17m) — A highly-optimized, single-file ONNX representation of ModernBERT-backed mxbai-edge-colbert-v0-17m (66 MB, 48-dimensional late-interaction embeddings). (Default Model)
  • intextus/mxbai-edge-colbert-v0-32m-onnx (Alias: mxbai-edge-colbert-v0-32m) — A larger, higher-capacity ONNX representation of ModernBERT-backed mxbai-edge-colbert-v0-32m (124 MB, 64-dimensional late-interaction embeddings).
  • intextus/lateon-onnx (Alias: lateon) — A high-capacity base ModernBERT-backed model (580 MB, 128-dimensional late-interaction embeddings). Note: LateOn is case-sensitive, so load it with IntextusEncoder("lateon", do_lower_case=False).

[!NOTE] Any ColBERT model exported via standard Hugging Face/PyLate workflows can be loaded locally by providing the path to its model.onnx and tokenizer.json.



⚖️ License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intextus_embed-0.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intextus_embed-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file intextus_embed-0.1.0.tar.gz.

File metadata

  • Download URL: intextus_embed-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intextus_embed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d536eadbe11eb2bb804386a601a9299447dd75f60a0027878ac64c9d9bc2da6e
MD5 530492ccd72143e58cc1ce93e153fc4d
BLAKE2b-256 5c47ea36f72ac3ed882f152cb248909866964533220ae9a355bb1afa312891b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for intextus_embed-0.1.0.tar.gz:

Publisher: publish.yml on Intextus/intextus-embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file intextus_embed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: intextus_embed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intextus_embed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7739ad17137b5c43a7c6ba9ddd3b98a00cef0b6e27c0377223bfe8ee9de7d1c
MD5 6b05e8e48af64258c5d1f169ae7d2c57
BLAKE2b-256 26b399f797c9494d625c01e03e7df85e63a241fe9da4e7bfd04cfdd928ff3d9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for intextus_embed-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Intextus/intextus-embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page