Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.7.tar.gz (109.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.7-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.7-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.7-cp38-abi3-win_amd64.whl (5.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.7-cp38-abi3-manylinux_2_28_i686.whl (7.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.7-cp38-abi3-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.7-cp38-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.7.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.7.tar.gz
  • Upload date:
  • Size: 109.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.7.tar.gz
Algorithm Hash digest
SHA256 52ad36806fd4f39e98ef73f585dd98b9c12cb1477ff099802a1f05cb5a49903a
MD5 05f9af3ee958cb99de97519153e66093
BLAKE2b-256 bf442ea9b483205fd54bec7ac7936508eb0121c60fbe831a20c27bd35c746c63

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 40e059df9a35d24007fe157f32a1413334a9c78faa57555b0ce5fb98d83d9f98
MD5 473204121eb4090f40a0dca93c60b9f2
BLAKE2b-256 646718f65bbdab3e189cdd3ed2b124d9f4d45a2405a737c63ab07b003fbfd3d4

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 89e01542af62d301a7db4e82337dce5455e9f77a3d72670c53feef96ed1a5dc7
MD5 33ee5d5e1cf3255c3e57673b4b4a751c
BLAKE2b-256 1e5ae623c7ae8b0dfb8c53c418594e5eb3ae82ec691099438d69a47fb9d4145b

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a0fbf33dad1a4d8eec36cafa3d1c83cfbf6bc69bc4cb18011e417514b8c576c2
MD5 d23986201aec50898110935e43f6f390
BLAKE2b-256 f97a7fb2e19bf3c6e0cd4b99433d2f6fd0ae58c90e3b81e2075c7136f7d14dd2

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 dee727cd6b91cbfbf834af7d11b63e0643f885a3cc517ebb2c13b2530c7aecec
MD5 842ccf0a2f51966b9ddebf78c3abbb17
BLAKE2b-256 09982bd5f66f22bc6a7ef5ab6bc1fcce31a584ffc2e730ba8ada95edcfe37210

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9769d1bd79623a2214b73e4f357a560b8e732a883368021caf13a1ff1d85cc99
MD5 f110210f621902d4e17dc53ccf37b329
BLAKE2b-256 12f856265eb7b357acec5623f3dfe7c23130291be15c1776669314c9854a38e4

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 58c09df4e215a3e29c4744e82459a3f19b3c7c63f74bc3aae7ca83365acee0df
MD5 25d5972823f8a783536848fa7a0d65e6
BLAKE2b-256 236046cbbe09b2e0b44709e6701277b39b051240e36763d3f27b06faf9f3c059

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.7-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b3d5ba0ee98fdc2f638dbc7306b50a0958fa240ffa663168b66dd26dc26ffe95
MD5 939766d26e3dc8ebf92dc6cea1820ab4
BLAKE2b-256 836285fb9bd72b1e27790cc600e33a99d0965882329adbcde3401b5e7653d226

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page