Skip to main content

A lightweight, zero-PyTorch ONNX encoder for generic ColBERT models.

Project description

intextus

License: MIT Python 3.8+

intextus is an ultra-lightweight, 100% PyTorch-free, and production-grade Python library designed to encode late-interaction ColBERT multi-vectors.

By replacing massive deep learning libraries with highly optimized, compiled C++/Rust backends, intextus delivers full ColBERT MaxSim embeddings in under 65MB of RAM with zero PyTorch or Transformers dependencies. It is optimized for edge devices, serverless functions (AWS Lambda, Cloudflare Workers), and resource-constrained environments.


Key Features

  • No PyTorch or Transformers: Fully decoupled from the heavy standard library pipeline. A simple pip install completes in seconds.
  • Micro Memory Footprint: Executes multi-vector graphs inside ONNX Runtime, drawing less than 65MB of RAM during inference.
  • Fast Rust Tokenization: Uses Hugging Face's raw Rust tokenization backend directly.
  • Dynamic Punctuation Skiplist: Dynamically parses tokenizer.json at initialization, creating a zero-overhead mask to discard punctuation vectors, matching ColBERT index-saving behaviors.
  • Standardized Late Interaction: Exposes native NumPy-based MaxSim calculations.

Installation

Install the library directly via pip:

pip install intextus-embed

[!NOTE] intextus currently defaults to highly optimized CPU inference. Full hardware acceleration and GPU execution support are planned for a future release.


Quick Start

Here is how to load a model, extract multi-vector embeddings, and compute late-interaction cross-similarity scores entirely in NumPy:

from intextus import IntextusEncoder, compute_maxsim

# Initialize the encoder (defaults to intextus/mxbai-edge-colbert-v0-17m-onnx)
model = IntextusEncoder()

# Or initialize from a local directory containing 'model.onnx' and 'tokenizer.json'
# model = IntextusEncoder("./my_model_directory")

# Extract query and document embeddings (Batch_Size, Sequence_Length, Dimension)
query_embeddings = model.encode_queries("What is ultra-low latency?")
doc_embeddings = model.encode_docs("ONNX runtime bypasses the PyTorch layer completely.")

# Compute the cross-similarity score via NumPy (using the first item in the batch)
score = compute_maxsim(query_embeddings[0], doc_embeddings[0])
print(f"Relevance Score (MaxSim): {score:.4f}")

Supported & Tested Models

intextus is designed for ultra-fast, edge-compatible ColBERT execution. The primary officially supported and fully validated models are:

  • intextus/mxbai-edge-colbert-v0-17m-onnx (Alias: mxbai-edge-colbert-v0-17m) — A highly-optimized, single-file ONNX representation of ModernBERT-backed mxbai-edge-colbert-v0-17m (66 MB, 48-dimensional late-interaction embeddings). (Default Model)
  • intextus/mxbai-edge-colbert-v0-32m-onnx (Alias: mxbai-edge-colbert-v0-32m) — A larger, higher-capacity ONNX representation of ModernBERT-backed mxbai-edge-colbert-v0-32m (124 MB, 64-dimensional late-interaction embeddings).
  • intextus/lateon-onnx (Alias: lateon) — A high-capacity base ModernBERT-backed model (580 MB, 128-dimensional late-interaction embeddings). Note: LateOn is case-sensitive, so load it with IntextusEncoder("lateon", do_lower_case=False).

[!NOTE] Any ColBERT model exported via standard Hugging Face/PyLate workflows can be loaded locally by providing the path to its model.onnx and tokenizer.json.


License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intextus_embed-0.1.2.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intextus_embed-0.1.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file intextus_embed-0.1.2.tar.gz.

File metadata

  • Download URL: intextus_embed-0.1.2.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intextus_embed-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b834bcf8e6dd7eb4c514da23744fbddf94b3eb0c0451f607d3e04d6579e3a41e
MD5 59f96a93f5284f3e3549e550fb33ba11
BLAKE2b-256 000ae2c9ec493faf10d6854ce2691b3577fd38e14b8640fc2559018503be4de2

See more details on using hashes here.

Provenance

The following attestation bundles were made for intextus_embed-0.1.2.tar.gz:

Publisher: publish.yml on Intextus/intextus-embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file intextus_embed-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: intextus_embed-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for intextus_embed-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 566934ac13b5c6ef5cedc914dec163ccd941a4dcd07f34d87cd386e8be5d1d94
MD5 c05aa6e17c4075c4255063a6879de1a2
BLAKE2b-256 6589d5bc5aa13b948674875f6b75e53ab1b45da0cfdf25360d597e4771c5cb7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for intextus_embed-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Intextus/intextus-embed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page