Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

pip install polars-fastembed

The polars dependency is required but not included in the package by default. It is shipped as an optional extra which can be activated by passing it in square brackets:

pip install polars-fastembed[polars]          # most users can install regular Polars
pip install polars-fastembed[polars-lts-cpu]  # for backcompatibility with older CPUs

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ list[f32]                       │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ list[f32]                       ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note: in the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.1.tar.gz (40.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.1-cp38-abi3-win_amd64.whl (10.9 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl (13.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_fastembed-0.1.1-cp38-abi3-macosx_11_0_arm64.whl (9.4 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.1.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.1.tar.gz
  • Upload date:
  • Size: 40.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.6 Linux/5.15.0-125-generic

File hashes

Hashes for polars_fastembed-0.1.1.tar.gz
Algorithm Hash digest
SHA256 43b9919afedc8dba865225c1ec33ad9b3c01c0ed2484aa26d36d6a560983d364
MD5 265ffa40794ddcd5d0a972fd033567dd
BLAKE2b-256 bf863ad68d33e44bc4311a0750f539e73f8051c6a1c962c6b688450219c2f923

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_fastembed-0.1.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.6 Linux/5.15.0-125-generic

File hashes

Hashes for polars_fastembed-0.1.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 15ef87b1d947031cc072a253e985eaea93bd5eb819a93085f9075e1002e53e11
MD5 cf8deb2b8371e338e3a8425150dab5cc
BLAKE2b-256 3d68e9bf49ba74c8292968613527f87623b3dddf02e57a7f92c46a272c1bfdfa

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.1-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9253793de354e98ea76771ff2400882b77e8f70b5be2ffe7ad40b231b8f919b1
MD5 447f635ea274f2b6665eaa13d01db50c
BLAKE2b-256 90db700c55af1d4028570a8c521d1297baba718c1262c2f48c0a4a8622cb7cec

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c92102094f362ca42b9fa15e4b07a5dd43221e084973b9d18879e7b335879266
MD5 4285c232927e6d8f1cd2927e6d95ee5a
BLAKE2b-256 696be2b37fbaa6755c56b1f3063b6fa9a1634349bfd4dd537f1bc7459768eb10

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6bf22608e62a41c8112a1855f4590c9441730ff264d3e0bcddd5dc4c4fa6963e
MD5 8260e750403c4e29654e7615c5522681
BLAKE2b-256 374dc18026f811cdb0d2d04b8e1f71af2cb6ed3f10b665d12783d303afda3b32

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.1-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 4fadf6215420e2de0b68bb7aed6a992c1ec08e549b5ebdc649c27046ace097da
MD5 ac15ec8bc2867981970d4992144af49e
BLAKE2b-256 787d9affd7a054d93c9dc489823e8631f0eca72b62731953c7c7ce7f46e9dce7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page