Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.6.tar.gz (109.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.6-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.6-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.6-cp38-abi3-win_amd64.whl (5.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.6-cp38-abi3-manylinux_2_28_i686.whl (7.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.6-cp38-abi3-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.6-cp38-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.6-cp38-abi3-macosx_10_12_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.6.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.6.tar.gz
  • Upload date:
  • Size: 109.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.6.tar.gz
Algorithm Hash digest
SHA256 7008c3f7c8d93519e09369c1fe83240332d04beee0248191c0779fa4b29fdfab
MD5 c7a71dd40078e01264b3d33e82992961
BLAKE2b-256 416e30cd8ed656868bacc18ccea573f301c83c47bff2b513fff06f42f5723ec4

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3c8ab91f08fc5ab74a9c0a686c612afb40d248da8e898e841bc14b4f1c0588a4
MD5 d941f420da3236d5857ce90ec31bd27f
BLAKE2b-256 31362d7ef0961974859dba106a39b849e24d607cdf73ba5f74267a12907fec9d

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 ad840330f777e266ff1cefd3622dfabf951c766a78fef64f1b7a14ab52e42215
MD5 e0818583eae01035ca61291c999b7ba2
BLAKE2b-256 241b0b13b6a6ecc3aaeefb1d29a2b7e12c6b1b8a504e6d40a678818ffe881ec1

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 41dc8609798772b742f77cc2c04e8e1bede097426f24e1a9f0ba6adaa31a4183
MD5 b8b5cffcd4ad0b7ad5c8479641037726
BLAKE2b-256 566527d98388a2b0b28e4a67cded76959638aa7bf2448f48af77407dc3cef448

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 7406c2e6b98de8585b952a6fb295dc249ac1726636b740f900d2a96df9a7782c
MD5 2b3ed891577855fb7202899eeb49d3f1
BLAKE2b-256 83d2185ec00b60d86065c11cbb1c3df062b9d754bc608d44482482dfd0458360

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8c9e62085446d05562902a39e0a9b51bb88fc311eefad146dfa8aed5ed0e0c4a
MD5 428978f5597e6a4e0d79a643a8a81789
BLAKE2b-256 be714b00800b702ccdc88961354a4e946b5113a233ff5b42abf9645fcb6723db

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 38b1a4573271e90fc8519ea68ecd330f976be64fcc14a933e12fa8ed11d3a74e
MD5 6c04f585f15eddc35dd73d33a81165c9
BLAKE2b-256 2a6e375c9311add97bae513501e307f91a46c0a9ff4a3e7fd06330f0c5e9ab05

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.6-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.6-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0398abe2aec4b1366f9a0c34e99621fa700e7b60e0b664badd2e880997a89f80
MD5 04fc3effecba921a973cf6c5dd088b66
BLAKE2b-256 a468b47130cbac8ca4755c2d4a770dd45b536dbd3fa3397ed7f346b4fbd20020

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page