Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed[cpu]

Change the [cpu] package extras according to your system:

  • for use with GPU: [cuda]
  • for backcompatibility with older CPUs: [rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)
  • Auto-detects GPU availability; uses CUDA if installed, otherwise CPU

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model, CUDA_AVAILABLE

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    By default, uses GPU if available, otherwise CPU
register_model(model_id)

#    Or explicitly control providers:
#    register_model(model_id, cuda=False)        # Force CPU only
#    register_model(model_id, cuda=True)         # Require GPU (error if unavailable)

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(f"CUDA available: {CUDA_AVAILABLE}")
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.17.tar.gz (123.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.17-cp38-abi3-win_amd64.whl (8.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.17-cp38-abi3-manylinux_2_28_i686.whl (15.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.17-cp38-abi3-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.17-cp38-abi3-macosx_10_12_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.17.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.17.tar.gz
  • Upload date:
  • Size: 123.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.17.tar.gz
Algorithm Hash digest
SHA256 10f03933afbf532cb569d3c0c867e37dbae0652c471d6ee5d5a5f806e420811d
MD5 bc2f6016de34b543384c9afc1c498c19
BLAKE2b-256 64125fad278f9fcdba6e9355e3a3fcc3ccd3f32d330a2f42d86309aa826c8ef4

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.17-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.17-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 52f15dc0798e7686980942e503eea080ec167d8f09f3065446950f7dd664a1f6
MD5 c8c1438af6cbce4dc516edf537605abd
BLAKE2b-256 183d6ac141b6e980b2bc3b0033e7644916952739f7c12b73df196f53643b98d7

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.17-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.17-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 6b29a04e8789bb15eed15f113806e399ee786c70e390667ac30e32c2de8f5811
MD5 2cfec0c081ce03a7177667709ae5efeb
BLAKE2b-256 8604526d03afa1832d952f2a8c8517b273263b2ed7f9eb6175780712a2fc00c7

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.17-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.17-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 93b6d93fafe0a808d7a1c6fa7598582f5b4a082e27a927c2ab1304b76814dc83
MD5 2fffa15b2cb4ad911fb6d59cede4971c
BLAKE2b-256 cc70dfe8ece61e998c10c275f72fdbb166c6a5f2f36241c228ca51b840f2c64b

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.17-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.17-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 24c5cdae5c83adc260c32b72df437de02809caa0ba05fa411f49aac297318cc0
MD5 94c58a8637fe2e8fd93fb83c7eaa62cb
BLAKE2b-256 8b0c6e98b6fa11f9f3e3e500cbab0f5c25117bba9229ea78e029a389229ca667

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page