Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.10.tar.gz (109.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.10-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.10-cp38-abi3-win_amd64.whl (5.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.10-cp38-abi3-manylinux_2_28_i686.whl (7.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.10-cp38-abi3-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.10-cp38-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.10-cp38-abi3-macosx_10_12_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.10.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.10.tar.gz
  • Upload date:
  • Size: 109.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.10.tar.gz
Algorithm Hash digest
SHA256 91a0aec975331306dd566607f376d2ea21aea6f6d1e90155be55ca69ec9ecc65
MD5 be77eafc985b1630b1d1277c2367cc0c
BLAKE2b-256 c7a11bcda8950290fe1ba318c40a617e4d5528deae4549400625a1997425c4a8

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 91666a40566ba512d14e4f7e63895865c5df739f6c8d740e2f100628a84fed97
MD5 9eb053d9712e9578ac81f3ad89cce3bb
BLAKE2b-256 4db08052ea985f7c263f09368082314cb6b1b170f2bcecd892fc39f00456c0ab

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 df202377ceade9fe2fd7dff3765d5d559eb4f9692340c98f548e187d64e0cb9d
MD5 f7a0c3a43e17853f690447972e595756
BLAKE2b-256 371704843691b72b47a108319a364103675c32d07499d2e50b3b9646e97a9eb6

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 bbc0b46320d661bcab75629cb120e2ec6b8c9201bf2f29189e173e4ebbe2856c
MD5 10abc2131afde407a35ee8ca24193f5d
BLAKE2b-256 91d0ff5072445b1935c12607641fe673cdb5329c3fadeb29407243f1bb63e590

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 a89d224ef638ab42b6401812fa1a3751048abd5a6621096919d8176bc14339da
MD5 f3458c662b4aa2f1b9a2319ab1400e10
BLAKE2b-256 dee897cd5dec9a4b826820145567f933d85b2e2369667db7c05317ac8e4c6d55

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3efd45331dcfc93dfbd2486b7b41032abe2d941aa1865ae2ea0490a38d1cb759
MD5 fe1ef6db4cd313d357eaa2867564f27e
BLAKE2b-256 c91dbacff67c8adf3ee2569b25ee34f721e27d8f9a17be58187f9cf72ea03665

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b50a84805d176de74b48d29255dab297f3a329955b79207902e7520d7f883c38
MD5 0ad12b846540a2991f5ca0c3376e0db8
BLAKE2b-256 f897a82514f3b6a293de16027a5b170383af5c540e8d0ede137143fb4b9232a0

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.10-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.10-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2c27796b0f92f213a3a315388557c7919edf5820f4a58e2e1583be760e233667
MD5 c12c35d0016e7583f86beee9fcc32545
BLAKE2b-256 24108166f7250c9d2fb26c2aa68f0209104f53ead381a45b824e2d86926d60f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page