Skip to main content

A lightweight, zero-PyTorch ONNX encoder for generic ColBERT models.

Project description

🕸️ intextus

License: MIT Python 3.8+

intextus (Latin for "woven into the text") is an ultra-lightweight, 100% PyTorch-free, and production-grade Python library designed to encode late-interaction ColBERT multi-vectors.

By replacing massive deep learning libraries with highly optimized, compiled C++/Rust backends, intextus delivers full ColBERT MaxSim embeddings in under 65MB of RAM with zero PyTorch or Transformers dependencies. It is optimized for edge devices, serverless functions (AWS Lambda, Cloudflare Workers), and resource-constrained environments.


⚡ Key Features

  • No PyTorch or Transformers: Fully decoupled from the heavy standard library pipeline. A simple pip install completes in seconds.
  • Micro Memory Footprint: Executes multi-vector graphs inside ONNX Runtime, drawing less than 65MB of RAM during inference.
  • Fast Rust Tokenization: Uses Hugging Face's raw Rust tokenization backend directly.
  • Dynamic Punctuation Skiplist: Dynamically parses tokenizer.json at initialization, creating a zero-overhead mask to discard punctuation vectors, matching ColBERT index-saving behaviors.
  • Standardized Late Interaction: Exposes native NumPy-based MaxSim calculations.

📦 Installation

Install the library directly via pip:

pip install intextus-embed

[!NOTE] intextus currently defaults to highly optimized CPU inference. Full hardware acceleration and GPU execution support are planned for a future release.


🚀 Quick Start

Here is how to load a model, extract multi-vector embeddings, and compute late-interaction cross-similarity scores entirely in NumPy:

from intextus import IntextusEncoder, compute_maxsim

# Initialize the encoder (defaults to intextus/mxbai-edge-colbert-v0-17m-onnx)
model = IntextusEncoder()

# Or initialize from a local directory containing 'model.onnx' and 'tokenizer.json'
# model = IntextusEncoder("./my_model_directory")

# Extract query and document embeddings (Batch_Size, Sequence_Length, Dimension)
query_embeddings = model.encode_queries("What is ultra-low latency?")
doc_embeddings = model.encode_docs("ONNX runtime bypasses the PyTorch layer completely.")

# Compute the cross-similarity score via NumPy (using the first item in the batch)
score = compute_maxsim(query_embeddings[0], doc_embeddings[0])
print(f"Relevance Score (MaxSim): {score:.4f}")

🎯 Supported & Tested Models

intextus is designed for ultra-fast, edge-compatible ColBERT execution. The primary officially supported and fully validated models are:

  • intextus/mxbai-edge-colbert-v0-17m-onnx (Alias: mxbai-edge-colbert-v0-17m) — A highly-optimized, single-file ONNX representation of ModernBERT-backed mxbai-edge-colbert-v0-17m (66 MB, 48-dimensional late-interaction embeddings). (Default Model)
  • intextus/mxbai-edge-colbert-v0-32m-onnx (Alias: mxbai-edge-colbert-v0-32m) — A larger, higher-capacity ONNX representation of ModernBERT-backed mxbai-edge-colbert-v0-32m (124 MB, 64-dimensional late-interaction embeddings).
  • intextus/lateon-onnx (Alias: lateon) — A high-capacity base ModernBERT-backed model (580 MB, 128-dimensional late-interaction embeddings). Note: LateOn is case-sensitive, so load it with IntextusEncoder("lateon", do_lower_case=False).

[!NOTE] Any ColBERT model exported via standard Hugging Face/PyLate workflows can be loaded locally by providing the path to its model.onnx and tokenizer.json.



⚖️ License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intextus_embed-0.1.1.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intextus_embed-0.1.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file intextus_embed-0.1.1.tar.gz.

File metadata

  • Download URL: intextus_embed-0.1.1.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intextus_embed-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7bad89e9f136725af727d4915734634b886dec2f99fdc304d7a61bb6a2d10b2e
MD5 0f2903d46d0ef9830f46c3d6631868f1
BLAKE2b-256 a20503e071a9576a2a6a4b3d1317c92f09e81b75b6d3109cb50f8f93bd1a8160

See more details on using hashes here.

Provenance

The following attestation bundles were made for intextus_embed-0.1.1.tar.gz:

Publisher: publish.yml on Intextus/intextus-embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file intextus_embed-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: intextus_embed-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intextus_embed-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 42764a381b4fe0c3ecbea16641ac9732f3aca0f3e3b6b1a57af2f2efdac740ed
MD5 3e39a321fd0cc73f84e8418d8c7bddc9
BLAKE2b-256 88db8dcf63880d3c4f3b4cc213d917aa9f3e78fcf01371494d72914e3d706731

See more details on using hashes here.

Provenance

The following attestation bundles were made for intextus_embed-0.1.1-py3-none-any.whl:

Publisher: publish.yml on Intextus/intextus-embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page