Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.5.tar.gz (113.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.5-cp38-abi3-win_amd64.whl (11.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.5-cp38-abi3-manylinux_2_28_i686.whl (17.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.5-cp38-abi3-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.5-cp38-abi3-macosx_11_0_arm64.whl (9.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.5-cp38-abi3-macosx_10_12_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.5.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.5.tar.gz
  • Upload date:
  • Size: 113.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f8548a1c349f503ee870a0b18f278fa06fa4e9951a4559cea1bc928642cc0b02
MD5 7ae6c1b7278ccce42b955ea44d6a24a0
BLAKE2b-256 ac890a750fd367aaa511752aa794c74d671cf636c773a48456267194084d5c53

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 7cf451c0e81833aa9e07ef63122843162aaac1edb02bdbf0260d818afb69594c
MD5 a1a0c2ff8807380dca9a6d97b1eeb335
BLAKE2b-256 0f7753ca56fd765ce1c9d3bfc7fcb7692827cfa0a39d00257f55919d37c59044

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2db49d5f5382adf73369695d6d492eb8688adbe0be5c0387290cdd4e4ccff24d
MD5 cc96e5fc23d7898cf429a3e08f8d1457
BLAKE2b-256 4245bc6367ce6f01ec0a82e4ae39124a1eb4343930296fc11475af57b1f9775d

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 eb035314038fb70d244657a6d0af59a6886ca0a60e01d90c54f283790dfc319f
MD5 87daec86466e169273a75cfc40ea981a
BLAKE2b-256 04808dacd887b607504d6280afa20058992d2ed07af0a0b5af3c1c6dcb11b78f

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 59cdd55e88a65760d117082ff9cc213707d310acf7be766810b2c19db482dec6
MD5 2eb703d56f505c9edb6fb819a5ebf9a5
BLAKE2b-256 d9b07b75d1f1ece4be6ee5de33aefacf226faadc29a238f729ded7bac964ee11

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 92a6f467adea94e5729a16d3a3b5ebf1ad44160948744ccb19f9289a6204ad29
MD5 c8b0fa0a5265e14d8a82dd373d4262a1
BLAKE2b-256 12053717acf4b2b035e16311d73d038863475d3b16b56828ae08a11a0691274a

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7eb6f2e83c0d1872b1682449b8831e46d64e66e97089bbd07184294223ce9d80
MD5 68bf99712327f53f04123e951cf0764d
BLAKE2b-256 b5a843a82c70fa70251c4cbcf3a1ad8c05bfb551965940f5bbfef69aad572074

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.5-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.5-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 211a9d7b6dc01e134a95c33763d0f3b0b7bb4275daead76a7db42761b1580d66
MD5 7200a0fa8ae96b22ba169cda98dcc00c
BLAKE2b-256 312de71c776fe66763d1f18f54c9f78d8ee8236c1316ca54ba3971a8ceb86aea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page