Skip to main content

A Polars plugin for fast lexical text embeddings

Project description

Polars Luxical

A high-performance Polars plugin for Luxical text embeddings, implemented in Rust.

Overview

This plugin provides Luxical embeddings directly within Polars expressions. Luxical combines:

  • Subword tokenization (BERT uncased)
  • N-gram feature extraction with TF-IDF weighting
  • Sparse-to-dense neural network projection via knowledge distillation

Luxical models achieve dramatically higher throughput than transformer-based embedding models while maintaining competitive quality for document-level similarity tasks like clustering, classification, and semantic deduplication.

It should be noted that they were not trained on queries, so you cannot use them for search! A demonstration of this is given in the benchmarks, where the results are fast but not useful.

Installation

pip install polars-luxical

Or build from source:

maturin develop --release

Model Download

Models are automatically downloaded from HuggingFace Hub and cached locally on first use.

Cache locations:

  • Linux: ~/.cache/polars-luxical/
  • macOS: ~/Library/Caches/polars-luxical/
  • Windows: C:\Users\<User>\AppData\Local\polars-luxical\

To use a local model file instead:

register_model("/path/to/your/model")

Both .safetensors and .npz formats are supported.

Usage

import polars as pl
from polars_luxical import register_model, embed_text

# Register a Luxical model (downloads and caches automatically)
register_model("DatologyAI/luxical-one")

# Create a DataFrame
df = pl.DataFrame({
    "id": [1, 2, 3],
    "text": [
        "Hello world",
        "Machine learning is fascinating",
        "Polars and Rust are fast",
    ],
})

# Embed text
df_emb = df.with_columns(
    embed_text("text", model_id="DatologyAI/luxical-one").alias("embedding")
)
print(df_emb)

# Or use the namespace API
df_emb = df.luxical.embed(
    columns="text",
    model_name="DatologyAI/luxical-one",
    output_column="embedding",
)

# Retrieve similar documents
results = df_emb.luxical.retrieve(
    query="Tell me about speed",
    model_name="DatologyAI/luxical-one",
    embedding_column="embedding",
    k=3,
)
print(results)

Available Models

Model ID Description Embedding Dim
DatologyAI/luxical-one English web documents, distilled from snowflake-arctic-embed-m-v2.0 192

Performance

Luxical embeddings avoid transformer inference entirely, achieving throughput up to ~100x faster than large transformer embedding models (e.g., Qwen3-0.6B) and significantly faster than smaller models like MiniLM-L6-v2, particularly on CPU.

For benchmarks and methodology, see the Luxical technical report.

API Reference

Functions

register_model(model_name: str, providers: list[str] | None = None) -> None

Register/load a Luxical model into the global registry. If already loaded, this is a no-op.

  • model_name: HuggingFace model ID (e.g., "DatologyAI/luxical-one") or local path.
  • providers: Ignored (kept for API compatibility).

embed_text(expr, *, model_id: str | None = None) -> pl.Expr

Embed text using a Luxical model.

  • expr: Column expression containing text to embed.
  • model_id: Model name/ID. If None, uses the default model.

clear_registry() -> None

Clear all loaded models from the registry (frees memory).

list_models() -> list[str]

Return a list of currently loaded model names.

DataFrame Namespace

df.luxical.embed(columns, model_name, output_column="embedding", join_columns=True)

Embed text from specified columns.

df.luxical.retrieve(query, model_name, embedding_column="embedding", k=None, threshold=None, similarity_metric="cosine", add_similarity_column=True)

Retrieve rows most similar to a query.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_luxical-0.1.1.tar.gz (8.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_x86_64.whl (5.9 MB view details)

Uploaded PyPymusllinux: musl 1.2+ x86-64

polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_i686.whl (5.7 MB view details)

Uploaded PyPymusllinux: musl 1.2+ i686

polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_armv7l.whl (5.5 MB view details)

Uploaded PyPymusllinux: musl 1.2+ ARMv7l

polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_aarch64.whl (5.3 MB view details)

Uploaded PyPymusllinux: musl 1.2+ ARM64

polars_luxical-0.1.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.6 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

polars_luxical-0.1.1-pp311-pypy311_pp73-manylinux_2_12_i686.manylinux2010_i686.whl (5.7 MB view details)

Uploaded PyPymanylinux: glibc 2.12+ i686

polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ x86-64

polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_i686.whl (5.7 MB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ i686

polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_armv7l.whl (5.5 MB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARMv7l

polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_aarch64.whl (5.3 MB view details)

Uploaded CPython 3.13tmusllinux: musl 1.2+ ARM64

polars_luxical-0.1.1-cp38-abi3-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl (5.9 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ x86-64

polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_i686.whl (5.7 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ i686

polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl (5.5 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARMv7l

polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl (5.3 MB view details)

Uploaded CPython 3.8+musllinux: musl 1.2+ ARM64

polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (6.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ppc64le

polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (5.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARMv7l

polars_luxical-0.1.1-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl (5.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.12+ i686

polars_luxical-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_luxical-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_luxical-0.1.1.tar.gz.

File metadata

  • Download URL: polars_luxical-0.1.1.tar.gz
  • Upload date:
  • Size: 8.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_luxical-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b3d433ee5da686c49e25dd1eabb5b53a52829ab2082a7a20ac6281a9e5f6dd0b
MD5 484c03a3c412b2a8e2d035dcb0749d50
BLAKE2b-256 2552fc879748a31ed0d4fc8ec75512002d8c5ccc74fa01ba29fc9d3d694c75e1

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 26928c9a1b258cf76f124c01240d3e39f290fb53e75f450d356e8fa010c24ef5
MD5 987d16f5f0241182fe716545efef756c
BLAKE2b-256 b671c1ef1a27c4b10723dd72406459d3f1255d32a3232939dddb118b0d6fe97a

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 3dbe8f61ed86f4b9172d67f17fcc1f3abdd82c770e40c4cdbb37446fc2e377b7
MD5 3dc9f70972ca8f7b416f3ff7aebb5d2e
BLAKE2b-256 7d26121a5ca92052a1e155801b0885fe13e2b1a24eea461493bcfaf35ee3fcee

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 fd3f25998cf31e9480f5d0eb16de7d0543acb973eb9c71740f833ba8627fb1c8
MD5 e4d4ff0dcc108e0ac64d9e875885a4d9
BLAKE2b-256 fe2ead8262b83f60292702b85c87793b1066e0988d45386196685aff3e06decc

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-pp311-pypy311_pp73-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 dcd07611e3dbacb56ba4e558e8576f8f0995606ab5e943dbb00224abf1144524
MD5 258d374a97a09276596e9f102cbff2ad
BLAKE2b-256 fc385f3c4590caf08c7ebc354853f1245529dd91ffbc63cf5aab110bdb73ccf6

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c7a8b041b764c5f375deaf65037bc3782e51b6dc0f37b452f0b2e20730b56a26
MD5 b05fc094783446f82e5319b9dcb00988
BLAKE2b-256 9261dfdb81698a752eaeb7643ad7651a2e38c88c5e36a8a9465a4aec1d4f1915

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-pp311-pypy311_pp73-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-pp311-pypy311_pp73-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 e6317e098ed2056f65f151f1a463a8091117454c93e03d63576311f0e336673c
MD5 8619eae1d8e0ec2a5953110a10d724b2
BLAKE2b-256 2bcfd9203565d63b1b67fef00d86c4dcbe470ddaec65717601e61d746a18c213

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 ce3bd7a9b80a7fc39abf5818a38c2b4e513270363278cab402e8cc4d406060a7
MD5 d432488467ab93d438577bae58590e9b
BLAKE2b-256 c8101136c75d25994f0c85e14567ef37176ac5987ffc99c4578cfa5a51457693

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 753b6cb2b35c869847ebdd534e27737fedc44fc7bc78817eac06f26102662297
MD5 de87af674870900a1fcd872a1a551113
BLAKE2b-256 97df4334c28a115e8d9bdab51d430c1ecba9e612e5188b9ebf3c19495465198e

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 fe26fa9a98a1b582b4225dace0306704a4394792320027b608567c0384160b8b
MD5 aaefda891fe1a5e8296108e6f4a33021
BLAKE2b-256 ab710018ce267abb72c86f8af051cb63d3b4cdd254c4afe11979a858a292601e

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp313-cp313t-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 f1da0d3e48a3d727a6e30d09c032c65e512caad329108f190828e0748375ccc6
MD5 782e17de59faca9bcea19f77a14c31d4
BLAKE2b-256 ef45b47e61512abb22cce4fbd370b7352d6fea24ea5552acb755e2df9ddeed6b

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8c31e46f73992c4a6ef1d6949d6934d6c3f9ea8b0e073fa508c2e4f518a170a0
MD5 ccc5955e3131e5e5e73213da2253d1dd
BLAKE2b-256 453b11d47e25cdcd02dcc6b1c9bb0b576d32b1695c9ba7950145c749b47399e2

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 66124925d1f4ba9309ccbf97024bf77f2858f90fca3d82ca2ec1185afe72744e
MD5 6ad9285ae07107a326758f187e6beab0
BLAKE2b-256 46415a74a59e798554aee15f179ac07e1557321d950c2502e2b016a4719c2a79

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_i686.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_i686.whl
Algorithm Hash digest
SHA256 435b1b799c459baee35cd699bc985cc406a1bd5c264de21235748e101e05cd5e
MD5 7249b27aec6c731bd6b432566ccc6e1a
BLAKE2b-256 d594c7d0ff7f0e9e2ad755f8e8f467bfc88f9359cc15a451b04d7a0ebfa6016d

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_armv7l.whl
Algorithm Hash digest
SHA256 ca9bfeb4c442ed4829ae65dd65c419fc4fc2dbe86687f337adc592089d140077
MD5 d6be4b0b42de833727cd2a9395322a2e
BLAKE2b-256 58fa6a39abec94d2d2c444a750e8f3682bf91606052190913d58fd857633cd25

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-musllinux_1_2_aarch64.whl
Algorithm Hash digest
SHA256 0628318a2e9e20e0f13bb494e9953b7e52230f0f37006d45fefe2c0bdbe7e1b5
MD5 2f04c884f06f6fd815f034ffbf950176
BLAKE2b-256 c44258dc8bd8d90fe170943aaa5d1b99e9e948e75d7c5c1aac514c730504cada

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 17bab4e0ff157cc0e6861194f582381a0f5253c0e17e9feadfbf66552fa9377c
MD5 eceaa1fa0b922e565dde7c51571798f1
BLAKE2b-256 9cf4f947191bfe27fa9cb71a3f51693fcc8e8f1ec5362653710ea7b27740d7b5

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 1cab2189074d65f36d810669653be002c9d52e0ac3a8fdde6be0c5f80c38121e
MD5 905dcf55f2cdb8cb388b8447f2aceb12
BLAKE2b-256 90688716b096b1bc648f85527fe01f4a634e567de9eceaf06545e69b7ca911a0

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 1fb841beff25c2df53b5d5322527a3820d2cd96a2ed040b4bd7417bd38eeef0a
MD5 476f8152fb7ed955842a36c7854032fd
BLAKE2b-256 3ada908f92abb64a2ca1add95ce8d39f6f4e7aa3679fc72cb102c9d8a0aa5c75

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-manylinux_2_12_i686.manylinux2010_i686.whl
Algorithm Hash digest
SHA256 1e81d35625fa87f4967894d55fd3ab3b08a090f3e0e48d0001fd355327ab49c3
MD5 5e9ecd76f866ecf61fdadf4f1ed6376d
BLAKE2b-256 21367f9f274311d668b4568e0072712f79949ba4df3ca25e5ce160711741f3ab

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 548d1f1b989cc4bc49143f8800a8f05e3a999274b6dfcb5d6760b3b896f186d8
MD5 2fe6480f350b1546978b07abe8af907a
BLAKE2b-256 32cadfd0fa1de0d002deeaa8566c04ab3876525c2505fbd08af1a84d35385a10

See more details on using hashes here.

File details

Details for the file polars_luxical-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_luxical-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b2482b311f61d8901599857f83c670640209a2d04a21f51b11778a929dfb7429
MD5 a5562a60b9d6251a782fdc43061a7fa8
BLAKE2b-256 3cdddcd11c8b086279dce0fc5691005a3dad9d8c29d0a8c3c6f23269ded1d405

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page