Dense embeddings and small-model inference for the Kelvin Agentic OS — Rust-native ONNX backend, model2vec static lookup, optional GPU
Project description
kaos-nlp-transformers
Part of Kelvin Agentic OS (KAOS) — open agentic infrastructure for legal work, built by 273 Ventures. See the full KAOS package map for the rest of the stack.
kaos-nlp-transformers is the dense-embedding and small-model inference
layer for KAOS — a typed Python API over an in-tree Rust cdylib that
calls ort (libonnxruntime via Rust)
to turn text into float32 vectors and back. It ships a license-vetted
model registry, an optional cross-encoder reranker, and a semantic-dedup
level that plugs into kaos-content's deduplication framework.
It is dependency-light at the BASE: the install pulls in only numpy,
huggingface_hub, and the core KAOS runtime (kaos-core,
kaos-content, kaos-nlp-core). No PyTorch, no Python fastembed,
no Python onnxruntime — the inference path is a Rust cdylib
(kaos_nlp_transformers._rust) shipped inside the wheel; libonnxruntime
is statically linked. Both embedding (EmbeddingModel) and
cross-encoder reranking (CrossEncoderReranker) run through the same
backend on CPU out of the box. Optional extras layer in adjacencies —
[gpu] for the GPU companion wheel (ort/cuda EP, NVIDIA),
[openvino] for Intel OpenVINO acceleration,
[model2vec] for the static-numpy lookup backend (~500x CPU speedup),
[clustering] for SciPy-backed semantic dedup, and [mcp] for the
MCP tool surface. Free-threaded Python (3.13t / 3.14t) is supported.
Install
uv add kaos-nlp-transformers
# or
pip install kaos-nlp-transformers
kaos-nlp-transformers requires Python 3.13 or newer (free-threaded
3.13t / 3.14t supported). The default install is CPU-only via the Rust
ort backend. Add the extras you need:
uv add "kaos-nlp-transformers[gpu]" # NVIDIA CUDA companion wheel (0.2.0a2)
uv add "kaos-nlp-transformers[openvino]" # Intel CPU / GPU acceleration (0.2.0a2)
uv add "kaos-nlp-transformers[model2vec]" # Static-numpy backend (~500x CPU)
uv add "kaos-nlp-transformers[clustering]" # SemanticDedupLevel (scipy)
uv add "kaos-nlp-transformers[mcp]" # MCP tool surface
0.2.0 migration note (KNT-601). Audit KNT-601 retired the Python
fastembedwrapper. Inference now goes through a Rust cdylib (ort+ libonnxruntime, statically linked). Same models, same outputs (per-row cosine ≥ 0.9999 vs the prior backend). TheEmbeddingModel.load/EmbeddingModel.embed/CrossEncoderRerankerpublic API is unchanged. The[gpu]/[openvino]extras are no-op stubs in 0.2.0a1; the GPU companion wheel ships in 0.2.0a2. The[torch]no-op alias from KNT-501 is still preserved for one more cycle; removed in 0.3.0. TheEmbeddingRetrievertext-only retriever is deprecated in favor ofkaos_content.indexing.SearchableDocumentandkaos_content.indexing.SearchableCorpus; removal scheduled for 0.3.0.
Platform coverage: per-platform cp313-abi3 wheels for Linux x86_64 +
aarch64 (manylinux + musllinux), macOS aarch64, Windows x86_64 +
aarch64. Free-threaded Python (3.13t / 3.14t) loads cleanly — no
_check_gil_enabled guard, no py_rust_stemmers SIGSEGV path.
Quick start
import numpy as np
from kaos_nlp_transformers import EmbeddingModel
# Load the v0 default model (BAAI/bge-small-en-v1.5, 33M params, MIT).
# First call downloads and caches; subsequent calls are O(1).
model = EmbeddingModel.load("BAAI/bge-small-en-v1.5")
# Embed a small batch. Returns a float32 numpy array of shape (N, dim).
texts = [
"Force majeure clauses excuse performance.",
"Indemnity caps the liability of the seller.",
]
vecs = model.embed(texts)
assert vecs.shape == (2, 384) and vecs.dtype == np.float32
# Cosine similarity over the L2-normalized rows.
def cosine(a, b):
return float(np.dot(a / np.linalg.norm(a), b / np.linalg.norm(b)))
print(f"sim: {cosine(vecs[0], vecs[1]):.3f}")
# sim: 0.637 (similar legal-contract topic, distinct concepts)
For retrieval over a corpus, build an EmbeddingRetriever:
import asyncio
from kaos_nlp_transformers import EmbeddingRetriever
retriever = EmbeddingRetriever.from_texts(
texts=[
"The buyer agrees to mediation in Delaware.",
"All disputes shall be resolved by arbitration in New York.",
"Force majeure clauses excuse performance.",
],
doc_ids=[0, 1, 2],
)
hits = asyncio.run(retriever.retrieve("where do contract disputes go?", top_k=2))
for h in hits:
print(f"{h.score:.3f} {h.text}")
Concepts
The package is built around a small set of typed primitives.
| Concept | What it is |
|---|---|
EmbeddingModel |
The single entry point for inference. EmbeddingModel.load(model_id, *, device=None, backend=None, settings=None) resolves the registry entry, picks a backend (fastembed for ONNX models on CPU/GPU, model2vec for static lookup models), and returns an instance with an .embed(texts, *, batch_size=32) -> np.ndarray method. Backends are process-cached by (model_id, revision, device, cache_dir) so repeated load() calls are O(1). |
RegisteredModel / REGISTRY / EXCLUDED |
Curated, license-vetted model catalog. Each entry pins a HuggingFace Hub commit SHA (audit-01 KNT-003: revisions thread through the loader cache key). The EXCLUDED map names models intentionally rejected with their licensing reason — jina-v3 (CC-BY-NC), NV-Embed (CC-BY-NC), Qwen3-Embedding (MS MARCO ambiguity). v0 ships BAAI/bge-small-en-v1.5 (33M, MIT, fastembed) plus three model2vec entries (potion-base-8M, potion-base-32M, potion-retrieval-32M). potion-base-8M is vendored inside the wheel (~28 MB), so it loads offline with no network. |
EmbeddingRetriever |
Brute-force cosine similarity search over a numpy matrix. from_texts(...) and from_corpus(...) factories. For corpora up to ~50K documents this is faster than FAISS overhead. Implements the kaos_nlp_core.search.SearchHit protocol. |
CrossEncoderReranker |
Optional second-pass reranker via fastembed.TextCrossEncoder (default BAAI/bge-reranker-base, MIT). No extra required for CPU; [gpu] accelerates on CUDA. Use to refine EmbeddingRetriever top-50 → top-10. Sigmoid-normalized scores in [0, 1]. |
SemanticDedupLevel |
Plug-in for kaos-content's deduplication framework. Embeds documents, computes pairwise cosine distance with scipy.spatial.distance.pdist, and clusters with scipy.cluster.hierarchy.fcluster. Requires the [clustering] extra. |
KaosNLPTransformersSettings |
Typed settings (env prefix KAOS_NLP_TRANSFORMERS_): default_model, default_reranker_model, cache_dir, offline, allow_unregistered, device, backend, profile. Honors legacy HF_HUB_OFFLINE and HF_HOME. When offline=True, the load path sets HF_HUB_OFFLINE=1 and TRANSFORMERS_OFFLINE=1 (audit-01 KNT-005). |
| Device detection | detect_devices() returns a SystemDevices snapshot (reachable accelerators + ONNX execution providers + latent GPUs the OS sees but the install can't drive). EmbeddingModel.load(device="auto") picks the best available; explicit "cpu" / "cuda" / "cuda:0" / "openvino" are honored. Audit-06 KNT-501 retired mps and xla alongside the torch backend. |
CLI
kaos-nlp-transformers ships a kaos-nlp-transformers administrative
CLI (info subcommand) plus a kaos-nlp-transformers-serve MCP server
launcher that requires the [mcp] extra:
kaos-nlp-transformers info --json # version + registry + device snapshot
kaos-nlp-transformers-serve # stdio MCP server (requires [mcp])
Compatibility & status
| Aspect | |
|---|---|
| Python | 3.13, 3.14 — GIL builds only. Free-threaded builds (3.13t / 3.14t / Py_GIL_DISABLED) are not supported: EmbeddingModel.load / CrossEncoderReranker.load raise BackendNotInstalledError because fastembed's transitive py_rust_stemmers and tokenizers C extensions segfault during module init without the GIL. Pending upstream Py_GIL_DISABLED declarations from those extensions; the guard is removed once that lands. Pure-Python py3-none-any wheel. |
| OS | Any platform with a CPython 3.13+ wheel and ONNX Runtime support — Linux x86_64 + aarch64 (manylinux), macOS x86_64 + arm64, Windows x86_64. |
| Maturity | Alpha. The public API is documented in kaos_nlp_transformers.__all__. |
| Stability policy | Pre-1.0: minor bumps may change behaviour. Every change is documented in CHANGELOG.md. |
| Test coverage | 138 unit tests + 24 integration tests (162 total, 77% line coverage). Integration suite hits real fastembed embedding + cross-encoder reranker downloads — no mocks. GPU tests gated on the gpu marker; reranker live tests on live. |
| Type checker | Validated with ty, Astral's Python type checker. |
Companion packages
kaos-nlp-transformers is one of the packages in the
Kelvin Agentic OS. The broader stack:
| Package | Layer | What it does |
|---|---|---|
kaos-core |
Core | Foundational runtime, MCP-native types, registries, execution engine, VFS |
kaos-content |
Core | Typed document AST: Block/Inline, provenance, views |
kaos-mcp |
Bridge | FastMCP server, kaos management CLI, MCP resource templates |
kaos-pdf |
Extraction | PDF → AST with provenance |
kaos-web |
Extraction | Web extraction, browser automation, search, domain intelligence |
kaos-office |
Extraction | DOCX / PPTX / XLSX readers + writers to AST |
kaos-tabular |
Extraction | DuckDB-powered SQL analytics |
kaos-source |
Data | Government + financial data connectors (Federal Register, eCFR, EDGAR, GovInfo, PACER, GLEIF) |
kaos-llm-client |
LLM | Multi-provider LLM transport |
kaos-llm-core |
LLM | Typed LLM programming (Signatures, Programs, Optimizers) |
kaos-nlp-core |
Primitives (Rust) | High-performance NLP primitives |
kaos-nlp-transformers |
ML | Dense embeddings + retrieval |
kaos-graph |
Primitives (Rust) | Graph algorithms + RDF/SPARQL |
kaos-ml-core |
Primitives (Rust) | Classical ML on the document AST |
kaos-citations |
Legal | Legal citation extraction, resolution, verification |
kaos-agents |
Agentic | Agent runtime, memory, recipes |
kaos-reference |
Sample | Reference module for module authors |
Packages depend on kaos-core; everything else is opt-in. Mix and match the
ones you need.
Development
git clone https://github.com/273v/kaos-nlp-transformers
cd kaos-nlp-transformers
uv sync --group dev --extra clustering
Install pre-commit hooks (recommended — they run the same checks as CI on every commit, scoped to staged files):
uvx pre-commit install
uvx pre-commit run --all-files # one-time full sweep
Manual QA commands (the same set CI runs):
uv run ruff format --check kaos_nlp_transformers tests
uv run ruff check kaos_nlp_transformers tests
uv run ty check kaos_nlp_transformers tests
uv run pytest tests/unit -q
Build from source
uv build
uv pip install dist/*.whl
Contributing
Issues and pull requests are welcome. See CONTRIBUTING.md
for setup, quality gates, pull request expectations, and engineering
standards. By contributing you agree to follow the
project conduct expectations and certify the
Developer Certificate of Origin v1.1 —
sign every commit with git commit -s. Please open an issue before starting
on a non-trivial change so we can align on scope.
Security
For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.
License
Apache License 2.0 — see LICENSE and NOTICE.
Copyright 2026 273 Ventures LLC. Built for kelvin.legal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaos_nlp_transformers-0.2.0a2.tar.gz.
File metadata
- Download URL: kaos_nlp_transformers-0.2.0a2.tar.gz
- Upload date:
- Size: 28.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a4f5ce2415d5228b323c30bfd033fb9221eb886e545bacb817097afdd227e1f
|
|
| MD5 |
06133719757d4e467ea467f72797bad1
|
|
| BLAKE2b-256 |
6a59f287bf4662cd651f427d3f8886a712e5be44e88cec3c002933323e145ee2
|
Provenance
The following attestation bundles were made for kaos_nlp_transformers-0.2.0a2.tar.gz:
Publisher:
release.yml on 273v/kaos-nlp-transformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_nlp_transformers-0.2.0a2.tar.gz -
Subject digest:
1a4f5ce2415d5228b323c30bfd033fb9221eb886e545bacb817097afdd227e1f - Sigstore transparency entry: 1485898799
- Sigstore integration time:
-
Permalink:
273v/kaos-nlp-transformers@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Branch / Tag:
refs/tags/v0.2.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_arm64.whl.
File metadata
- Download URL: kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_arm64.whl
- Upload date:
- Size: 37.2 MB
- Tags: CPython 3.13+, Windows ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
711a6eabd7e75dbe984671133135c79f765fa0fe2a918fe2d1cfbc9d77f4a1c0
|
|
| MD5 |
e0a6bc1c6ffaa713c2a5bf2b25ff61c0
|
|
| BLAKE2b-256 |
47fccf90cf08f9c2feb548dbbd6d9b943fb57d8e57f8423068a65ae532bcf67f
|
Provenance
The following attestation bundles were made for kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_arm64.whl:
Publisher:
release.yml on 273v/kaos-nlp-transformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_arm64.whl -
Subject digest:
711a6eabd7e75dbe984671133135c79f765fa0fe2a918fe2d1cfbc9d77f4a1c0 - Sigstore transparency entry: 1485899032
- Sigstore integration time:
-
Permalink:
273v/kaos-nlp-transformers@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Branch / Tag:
refs/tags/v0.2.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_amd64.whl.
File metadata
- Download URL: kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_amd64.whl
- Upload date:
- Size: 37.7 MB
- Tags: CPython 3.13+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8beca96a60f19849dff04622015f96918a541c4a6f60496a0d30e1ce4261d599
|
|
| MD5 |
3f6650b8ce4d61828465c9be1c264a57
|
|
| BLAKE2b-256 |
ed51ffcf430bb9985d11ad22a79ed1ffc8085569c688e5f9a50e5305b0f5ba4a
|
Provenance
The following attestation bundles were made for kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_amd64.whl:
Publisher:
release.yml on 273v/kaos-nlp-transformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_nlp_transformers-0.2.0a2-cp313-abi3-win_amd64.whl -
Subject digest:
8beca96a60f19849dff04622015f96918a541c4a6f60496a0d30e1ce4261d599 - Sigstore transparency entry: 1485898945
- Sigstore integration time:
-
Permalink:
273v/kaos-nlp-transformers@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Branch / Tag:
refs/tags/v0.2.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 38.5 MB
- Tags: CPython 3.13+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed704515de51ad1de5f8947c123ff163f829a044c833154c63fdb34a1c6d9fd9
|
|
| MD5 |
ea9f3c32dc165691a7eff823231b8281
|
|
| BLAKE2b-256 |
5d21f659304351ed0700a3a351b913fe7703cd71f72c7b857c2d7f1e54352ed2
|
Provenance
The following attestation bundles were made for kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_x86_64.whl:
Publisher:
release.yml on 273v/kaos-nlp-transformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_x86_64.whl -
Subject digest:
ed704515de51ad1de5f8947c123ff163f829a044c833154c63fdb34a1c6d9fd9 - Sigstore transparency entry: 1485898882
- Sigstore integration time:
-
Permalink:
273v/kaos-nlp-transformers@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Branch / Tag:
refs/tags/v0.2.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 39.4 MB
- Tags: CPython 3.13+, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dbbc51bedfe23a01167916c9419ca694d2de5133ada11d953d34d0bfa6c7109
|
|
| MD5 |
e38e4f0343b734f4e057ffa3d00eaa97
|
|
| BLAKE2b-256 |
a495b86ff0933dec4913e2f9822715396c99ac2692286b801fb60a5850137ad2
|
Provenance
The following attestation bundles were made for kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_aarch64.whl:
Publisher:
release.yml on 273v/kaos-nlp-transformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_nlp_transformers-0.2.0a2-cp313-abi3-manylinux_2_28_aarch64.whl -
Subject digest:
9dbbc51bedfe23a01167916c9419ca694d2de5133ada11d953d34d0bfa6c7109 - Sigstore transparency entry: 1485898986
- Sigstore integration time:
-
Permalink:
273v/kaos-nlp-transformers@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Branch / Tag:
refs/tags/v0.2.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_nlp_transformers-0.2.0a2-cp313-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: kaos_nlp_transformers-0.2.0a2-cp313-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 37.2 MB
- Tags: CPython 3.13+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df8814abc9e298af9c2e5d1750da8e69b91f9a62935ec11d89527950fa8a2c59
|
|
| MD5 |
6398c64357fa370d05053615c6324f38
|
|
| BLAKE2b-256 |
359f748e97f355616bb72d9ce7479f6dc70799425fb231ce1dedcc4eb392f9a0
|
Provenance
The following attestation bundles were made for kaos_nlp_transformers-0.2.0a2-cp313-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on 273v/kaos-nlp-transformers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_nlp_transformers-0.2.0a2-cp313-abi3-macosx_11_0_arm64.whl -
Subject digest:
df8814abc9e298af9c2e5d1750da8e69b91f9a62935ec11d89527950fa8a2c59 - Sigstore transparency entry: 1485898848
- Sigstore integration time:
-
Permalink:
273v/kaos-nlp-transformers@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Branch / Tag:
refs/tags/v0.2.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@89c6bf2924dc89270fb1218a3f76a78717ad8dfa -
Trigger Event:
push
-
Statement type: