Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

pip install polars-fastembed

The polars dependency is required but not included in the package by default. It is shipped as an optional extra which can be activated by passing it in square brackets:

pip install polars-fastembed[polars]          # most users can install regular Polars
pip install polars-fastembed[polars-lts-cpu]  # for backcompatibility with older CPUs

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.3.tar.gz (41.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.3-cp38-abi3-win_amd64.whl (11.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.3-cp38-abi3-manylinux_2_34_ppc64le.whl (16.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ ppc64le

polars_fastembed-0.1.3-cp38-abi3-manylinux_2_28_i686.whl (16.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl (13.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (18.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ s390x

polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (14.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARMv7l

polars_fastembed-0.1.3-cp38-abi3-macosx_11_0_arm64.whl (9.5 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.3.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.3.tar.gz
  • Upload date:
  • Size: 41.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.11.11 Linux/6.8.0-51-generic

File hashes

Hashes for polars_fastembed-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d93540f95d19722c60f80d2b3c292091c2ff1b659b63a802efe78a99fb76a29e
MD5 c14b2aedbfca5bd7e13cb18ff8e48f2d
BLAKE2b-256 c5c3bfc8b7f2eecb5971f2c5b96a441fa9f9e642d3fd8b3e8436d9af11a59c00

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_fastembed-0.1.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 11.0 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.11.11 Linux/6.8.0-51-generic

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 93f18767857bbdc8351bcda58cb6f637373349cfb92c4017f4c02dd5e005c6e4
MD5 d0116d419940147998b185c0f9b4b5c1
BLAKE2b-256 7f33fd5c1e804eaf852a4fd5350291d0d077850e84d50d012ee6a82fe7373028

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 6d66a61f13b97c3629d69e87149b5a79556ceca527bc86b197e32a88ad8bf52d
MD5 644c92704a2f2fceeeb06d4c02f8cf8f
BLAKE2b-256 96dd48e6b3f31f91f37720cfbc9fa1cbcb6f422c2039ad0a39a46b55d30a8a6a

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 8bee0535456b0b747763067a616b4867f14abf19ba4913cd9236a9b2c0089746
MD5 507298cc65bbaf9f30cf94a222bfd747
BLAKE2b-256 d28649a49d5728be8a61676f4602348b36068eda7c99cfba292e15ec31e39742

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9768ec9b1c4f92bc1ce4320de4b76b8068a7d57e9157cf53d57fa02334dd45fc
MD5 b7bf0f22018f0f7412e624a8ad162624
BLAKE2b-256 11c7c16c80453c31821ab1396b2df3d432197d244229b2173ae7a595681f226b

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 26986687fb8feda26234710ee48826dce9083653d1ad0aabfcded3e9d76f1f7c
MD5 9538bf2d97242353c2fddd70da6dc229
BLAKE2b-256 1a6b9e56993a55e9acb586202252b3372d67f4a2867b0a907bb19ffb59bb1576

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 1d8f2fee2cd85d4d15237241327ed8d396c0c1cee529b99b021fe66c21319a26
MD5 ea3e32611996cfa1dbeaef3565be5e7b
BLAKE2b-256 a81760afdda388ea0e59461afe309504bdd116227a58229a968938cbbf4b3b0c

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 6d9df7a8ad1d7c7c01bec2dd4dba57658c476e994b3e0c37f2041e11e6f0559d
MD5 62f8290d60f9a3ebd0346572a686cccf
BLAKE2b-256 62ed6d8a9276692a56bc6a3f710e3545f7fd14bc8ea07845c69037d9434f9697

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f38cf55fce9b0966bbcd5fb74ec28a01330e98f85ca7513ddd9ead93fb2bb3d7
MD5 76c13f217be4dd77c9d7f04bb3299fde
BLAKE2b-256 70914e4734996f80564346d1fda398facce361e40308b772d4e229ea4fb06ac3

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d0f55478c240dff634502e7040f5c279a61ca05214c3d8d11ed39e7e533a3305
MD5 a2ff57a347446f170f72947092e88f0f
BLAKE2b-256 20b0dfc457396618e05909254435d3d3f9db6bbdf4e57ed287bf20701d90b3a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page