Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

pip install polars-fastembed

The polars dependency is required but not included in the package by default. It is shipped as an optional extra which can be activated by passing it in square brackets:

pip install polars-fastembed[polars]          # most users can install regular Polars
pip install polars-fastembed[polars-lts-cpu]  # for backcompatibility with older CPUs

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ list[f32]                       │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ list[f32]                       ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.2.tar.gz (40.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.2-cp38-abi3-win_amd64.whl (10.9 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.2-cp38-abi3-manylinux_2_34_ppc64le.whl (16.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ ppc64le

polars_fastembed-0.1.2-cp38-abi3-manylinux_2_28_i686.whl (16.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl (13.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.3 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (18.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ s390x

polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (14.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARMv7l

polars_fastembed-0.1.2-cp38-abi3-macosx_11_0_arm64.whl (9.4 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.2.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.2.tar.gz
  • Upload date:
  • Size: 40.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.11.11 Linux/6.8.0-51-generic

File hashes

Hashes for polars_fastembed-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0774b5ac5ca856bb66bfa9a8f530595037ffb187ef9b7df7f838aa72609705d6
MD5 aa55ed338038a557488672e7e5bfbdc9
BLAKE2b-256 62dbffcf3c677fdfe7de085dbe90061472252ebd0c872d15b503b815225679e3

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_fastembed-0.1.2-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 10.9 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.11.11 Linux/6.8.0-51-generic

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e72ff798e929747361c97cef16026ad4f1323465dc0fdd92eebec010bc73c16f
MD5 cb8ff12433f1dc893dc517a2d0f86c3a
BLAKE2b-256 463a98a4cc8364207597e952a7236f0dd8d5324696ed237500af543ec17e5c18

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-manylinux_2_34_ppc64le.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-manylinux_2_34_ppc64le.whl
Algorithm Hash digest
SHA256 3e52f3f804ad6f3d6e96ad156096425c6284efea76b17f69733ff00dd328b586
MD5 12f0552aa215f5946361cabc35cacc08
BLAKE2b-256 c41b91da9d77e02589e005741e99c361a7bc433fee84838e3eabb34eb66f5a0f

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 0060207d6a28096244f9712739ea645d09552bbf16146332d44f0146e77957f0
MD5 682b98283129b8c75b81b542b8da33e1
BLAKE2b-256 5fb79cbeab07442344bcd66c90e2bbf9ad389305024fad809976bac8ed9b220b

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9a2bf0a83d7a38d60694d7030d38dc575221639433f743c1872ab5a08a08227f
MD5 001aace7af20070e72eddc2eaa1ecc37
BLAKE2b-256 aec82897f62fc7f40737881ca91d97615872803956c5217fb02ae38b6c5bd440

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9bbbddad02fbe961341bd9d18b0637e51228f274c238a63cccd27789b799c35a
MD5 2a55e1a858c773c6ded62ef60a257d30
BLAKE2b-256 58a99d6663b9a4a524462df257bb0463344779df98080dccb04aff1f9a193a1f

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 6f86a33365446fd9141aad03329df9640ce4223addf047aa379227331b28c84e
MD5 7f6ce6abfb3dbe42fcdd6de18db7ad2b
BLAKE2b-256 6ca9f0c0f512769de21bdad739086e84f96e9494d5679177205b8d2f6634585b

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 383829d1f0395b4520bde014c32601aefefd8a060f8d0cec849714cb13f19fdc
MD5 5c5096393e8a245fe892ec270e88a277
BLAKE2b-256 97a5af54bdeda5076a9afb9c6ccfe961453968cf953de7752e664882d3e86b50

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ceecaa0843ac2b9a7a4f6ba4d06d48bebb78826b35258b0cf5c635bec989c121
MD5 96fb037b40d63c0a32e7fbd13d769c6b
BLAKE2b-256 365403ba54b9041d7cd9c8c2a96ba936ff988f31fa459aad739c2df2ace88a54

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.2-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3562d92df409440b466e16d115d124fb625c3e533664626c8425697906a1e92e
MD5 c49da4c54813f3e19e88c3b73420f249
BLAKE2b-256 5d2491e5f7623825fe3daf7ec05938a77037b95b38131b5e0c3204477dbbd666

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page