Skip to main content

A Polars plugin for embedding DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

pip install polars-fastembed

The polars dependency is required but not included in the package by default. It is shipped as an optional extra which can be activated by passing it in square brackets:

pip install polars-fastembed[polars]          # most users can install regular Polars
pip install polars-fastembed[polars-lts-cpu]  # for backcompatibility with older CPUs

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "BAAI/bge-small-en"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text", model_name=model_id, output_column="embedding"
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f64, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [-0.023137, -0.025523, … 0.028… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.031434, -0.031442, … -0.03… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.074164, 0.002853, … 0.0247… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f64, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.031434, -0.031442, … -0.03… ┆ 0.924065   │
│ 1   ┆ Hello world                     ┆ [-0.023137, -0.025523, … 0.028… ┆ 0.828904   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.074164, 0.002853, … 0.0247… ┆ 0.805416   │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file polars_fastembed-0.1.0.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.6 Linux/5.15.0-125-generic

File hashes

Hashes for polars_fastembed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6b1877423c2ff4b8429935afc2ff4e069f61d31f055d4b297b5192b7016a95b1
MD5 832074e98c12543918882595227d55c6
BLAKE2b-256 0d5b857d449d1e3a4cfec3dcd009cf97f99684b618e7f08e41dce2e615f2ca44

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: polars_fastembed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.22.3 CPython/3.10.6 Linux/5.15.0-125-generic

File hashes

Hashes for polars_fastembed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a1c05394e5a8a26ff3707dc91ff6ca4eac32ba7879fc124e3dc9a198010c0f9
MD5 6f97b3642b49711c239af66bebc77737
BLAKE2b-256 ca750d75dc0417c36cfc8be1aa9a9d1d4530c49fa643e0538918d685087b3506

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page