Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.12.tar.gz (115.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.12-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.12-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.12-cp38-abi3-win_amd64.whl (5.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.12-cp38-abi3-manylinux_2_28_i686.whl (7.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.12-cp38-abi3-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.12-cp38-abi3-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.12.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.12.tar.gz
  • Upload date:
  • Size: 115.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.12.tar.gz
Algorithm Hash digest
SHA256 d59c53c3b1fac662e066ba74d354f309e68686958824268dd447a2994adad84d
MD5 05aa6ca4e954959871d8564bb5e9beb8
BLAKE2b-256 d9cd60de4b4c3d98d493e6a474afbf2efc1ff97f6470dd153ecd2b4718be6ca2

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 61c38b5b5a62cb6a90f3ca7afe075c197c87a4f9805a0168c27df5e652807b80
MD5 47b1f8b549d4945f94e9d71f10f2e172
BLAKE2b-256 20b75999bef0ea4f1a9e2f385fb1676aecaa5def4c63d6c4ca3e689508942509

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6a617b16c968345c94be0462ddf36ce2cf5ce06ecdfcbeca19a627c44c47e945
MD5 b98f60f2541d999522c6e7cdf5643c5f
BLAKE2b-256 a4d994af1ed81dd2d9a99b8574965ae30482d9a52137f267b0516e5be29094d2

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c4841eb8ecf115c2e37e8292a6e754e8349b1e12e33cc0ffaed1dfba6a5e5fcc
MD5 5535f48e194a1f72f605526a2dbec246
BLAKE2b-256 216e0b4a817e495428f3dd6da2696552ca7c3ecb9a5c37d73e6d13f9b348ee36

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 3cd0bb920db49d06fbd924b3c6736989d601af5012feb4271f6f8a58bc68d8fe
MD5 d53ec8713cc40fb75ea7de5bb38d7919
BLAKE2b-256 391ddd41c20657ad236600077ba3afad87b19a51fcd7a862d048782f01ab16d3

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e346d589939550bca3492258e1300e7558594bf7a7c4b24617f64e5b25a80532
MD5 f222310cac6e0be9d662fb2627bd4872
BLAKE2b-256 1084adf9575b832335acb3316e58e753f84d723622c78abee43c1cb9d5a9f50c

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f080f31056e03940db8cd7c6f2b6779c4d264e2ee850a7903e22191b844a908
MD5 ccdc8b96928d1a642cacfa0beac8bad9
BLAKE2b-256 e7896edf8075a002247ee0442a119571a8a4473d0b8ed801f27fe4c4f32c5ba8

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.12-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e77cc5faed2fdb7cfe386bb098bec763a3377a4aecbf4c621bc5020c87574510
MD5 ee14f18d37689fbc7cb753246d8f8c16
BLAKE2b-256 cfebb7eeb40387d416162070c204e896168e4f657e0e353c498e345bff374a81

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page