A lightweight, zero-PyTorch ONNX encoder for generic ColBERT models.
Project description
🕸️ intextus
intextus (Latin for "woven into the text") is an ultra-lightweight, 100% PyTorch-free, and production-grade Python library designed to encode late-interaction ColBERT multi-vectors.
By replacing massive deep learning libraries with highly optimized, compiled C++/Rust backends, intextus delivers full ColBERT MaxSim embeddings in under 65MB of RAM with zero PyTorch or Transformers dependencies. It is optimized for edge devices, serverless functions (AWS Lambda, Cloudflare Workers), and resource-constrained environments.
⚡ Key Features
- No PyTorch or Transformers: Fully decoupled from the heavy standard library pipeline. A simple
pip installcompletes in seconds. - Micro Memory Footprint: Executes multi-vector graphs inside ONNX Runtime, drawing less than 65MB of RAM during inference.
- Fast Rust Tokenization: Uses Hugging Face's raw Rust tokenization backend directly.
- Dynamic Punctuation Skiplist: Dynamically parses
tokenizer.jsonat initialization, creating a zero-overhead mask to discard punctuation vectors, matching ColBERT index-saving behaviors. - Standardized Late Interaction: Exposes native NumPy-based MaxSim calculations.
📦 Installation
Install the library directly via pip:
pip install intextus-embed
[!NOTE]
intextuscurrently defaults to highly optimized CPU inference. Full hardware acceleration and GPU execution support are planned for a future release.
🚀 Quick Start
Here is how to load a model, extract multi-vector embeddings, and compute late-interaction cross-similarity scores entirely in NumPy:
from intextus import IntextusEncoder, compute_maxsim
# Initialize the encoder (defaults to intextus/mxbai-edge-colbert-v0-17m-onnx)
model = IntextusEncoder()
# Or initialize from a local directory containing 'model.onnx' and 'tokenizer.json'
# model = IntextusEncoder("./my_model_directory")
# Extract query and document embeddings (Batch_Size, Sequence_Length, Dimension)
query_embeddings = model.encode_queries("What is ultra-low latency?")
doc_embeddings = model.encode_docs("ONNX runtime bypasses the PyTorch layer completely.")
# Compute the cross-similarity score via NumPy (using the first item in the batch)
score = compute_maxsim(query_embeddings[0], doc_embeddings[0])
print(f"Relevance Score (MaxSim): {score:.4f}")
🎯 Supported & Tested Models
intextus is designed for ultra-fast, edge-compatible ColBERT execution. The primary officially supported and fully validated models are:
intextus/mxbai-edge-colbert-v0-17m-onnx(Alias:mxbai-edge-colbert-v0-17m) — A highly-optimized, single-file ONNX representation of ModernBERT-backedmxbai-edge-colbert-v0-17m(66 MB, 48-dimensional late-interaction embeddings). (Default Model)intextus/mxbai-edge-colbert-v0-32m-onnx(Alias:mxbai-edge-colbert-v0-32m) — A larger, higher-capacity ONNX representation of ModernBERT-backedmxbai-edge-colbert-v0-32m(124 MB, 64-dimensional late-interaction embeddings).intextus/lateon-onnx(Alias:lateon) — A high-capacity base ModernBERT-backed model (580 MB, 128-dimensional late-interaction embeddings). Note: LateOn is case-sensitive, so load it withIntextusEncoder("lateon", do_lower_case=False).
[!NOTE] Any ColBERT model exported via standard Hugging Face/PyLate workflows can be loaded locally by providing the path to its
model.onnxandtokenizer.json.
⚖️ License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intextus_embed-0.1.0.tar.gz.
File metadata
- Download URL: intextus_embed-0.1.0.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d536eadbe11eb2bb804386a601a9299447dd75f60a0027878ac64c9d9bc2da6e
|
|
| MD5 |
530492ccd72143e58cc1ce93e153fc4d
|
|
| BLAKE2b-256 |
5c47ea36f72ac3ed882f152cb248909866964533220ae9a355bb1afa312891b0
|
Provenance
The following attestation bundles were made for intextus_embed-0.1.0.tar.gz:
Publisher:
publish.yml on Intextus/intextus-embed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
intextus_embed-0.1.0.tar.gz -
Subject digest:
d536eadbe11eb2bb804386a601a9299447dd75f60a0027878ac64c9d9bc2da6e - Sigstore transparency entry: 1847862335
- Sigstore integration time:
-
Permalink:
Intextus/intextus-embed@ab8a7333de6e7e71a54286efa676a7fbbc1dad9e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Intextus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ab8a7333de6e7e71a54286efa676a7fbbc1dad9e -
Trigger Event:
push
-
Statement type:
File details
Details for the file intextus_embed-0.1.0-py3-none-any.whl.
File metadata
- Download URL: intextus_embed-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7739ad17137b5c43a7c6ba9ddd3b98a00cef0b6e27c0377223bfe8ee9de7d1c
|
|
| MD5 |
6b05e8e48af64258c5d1f169ae7d2c57
|
|
| BLAKE2b-256 |
26b399f797c9494d625c01e03e7df85e63a241fe9da4e7bfd04cfdd928ff3d9f
|
Provenance
The following attestation bundles were made for intextus_embed-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Intextus/intextus-embed
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
intextus_embed-0.1.0-py3-none-any.whl -
Subject digest:
b7739ad17137b5c43a7c6ba9ddd3b98a00cef0b6e27c0377223bfe8ee9de7d1c - Sigstore transparency entry: 1847862442
- Sigstore integration time:
-
Permalink:
Intextus/intextus-embed@ab8a7333de6e7e71a54286efa676a7fbbc1dad9e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Intextus
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ab8a7333de6e7e71a54286efa676a7fbbc1dad9e -
Trigger Event:
push
-
Statement type: