Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.15.tar.gz (122.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.15-cp38-abi3-win_amd64.whl (8.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.15-cp38-abi3-manylinux_2_28_i686.whl (15.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.15-cp38-abi3-macosx_11_0_arm64.whl (4.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.15-cp38-abi3-macosx_10_12_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.15.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.15.tar.gz
  • Upload date:
  • Size: 122.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.15.tar.gz
Algorithm Hash digest
SHA256 20c433be8f915bdfee5f3d2100706bcf696c881aba4fe2859c78e9debea708ec
MD5 7464709d136ab99722fd7bc35b975788
BLAKE2b-256 1ba01ada2c95c06669ad59be297336f30d4927844a39cc12cbc32714cbac1727

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.15-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.15-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 86b1ca350104713e36d68858f0c79b255ee11f7c5280bc82bed2cb2b35949365
MD5 39e3827a557194e3c981d7419e16a240
BLAKE2b-256 160bce07e064470a621c0e232a1a1af08fee64f1b6f58b32a79f60102771831d

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.15-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.15-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 6f0e88cf5cb9ed25ccf9effada3820aea62c4c614042b019a6e24e7317efa1ca
MD5 df8a7751580224653b257867ba9dd5b6
BLAKE2b-256 2c1eff5e0f5e07af2c2ec321018de7053c8b2b7f08392f8be0dfc20c561f1ba1

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.15-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.15-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c42ddeb58e119a45e62ece2c50e90215775f10ec651809abe073012614b73339
MD5 f28c289e52a45a38ddc5baed014c526c
BLAKE2b-256 a4deea419d8084a740b2824eb8778d407aed4c0f2e7a4f3642c67f6ee5829811

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.15-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.15-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f82b5b55817bb738c78253ef687dbd7daf8a1c9a3578bc1a9763f25eb1e7f396
MD5 59ee8c9b9db129a0d03e96992b4e19c6
BLAKE2b-256 72add5f7ab80974729713835a43beae40f62e367731cd97b504d15dc7c53ef3c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page