Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.11.tar.gz (109.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.11-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.11-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.11-cp38-abi3-win_amd64.whl (5.2 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.11-cp38-abi3-manylinux_2_28_i686.whl (7.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.11-cp38-abi3-manylinux_2_28_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.11-cp38-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.11-cp38-abi3-macosx_10_12_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.11.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.11.tar.gz
  • Upload date:
  • Size: 109.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.11.tar.gz
Algorithm Hash digest
SHA256 bcd13e20aff05e33118bb6aafbc2a9aa2667542e7d27a4113e934da02ad247d6
MD5 5095affccae6416b1463fd84e8b3ec12
BLAKE2b-256 4cc52f000249b8c33e1a9f7d3679c218b10a7f5e172c1e585c40fda23fb69ba4

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 554c233c034b2aa7e3d18ce7cd9167748ac42144b9ff9b323b29363b5337ead7
MD5 66fe08c07876453e59efeba1a21aaf07
BLAKE2b-256 7f23276f36380c05ac6c7a1d2f92b5bdafdfdb07242ef1f062540f7e033c4001

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2a2b704aebecc20da97ab1c0c5bdf928a60e49911b13c4f80849a7434292e081
MD5 04925a11deb0db893a3a769bfee91be4
BLAKE2b-256 545734dc731935134bab2b5d569e1b263b481a3564a8539cfc8e489e59ad433c

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 710d9150c25a542f0560d1cea1526b6e0e91677c615ca36e5f3fe5f06aa40783
MD5 bba2f425c34e378a70407ecc382ecf5c
BLAKE2b-256 e7d2953dd2a8c93ed5ae2ed42e9869b9ea37b0fdb906e1095a22cfece0b0f242

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 4c1275adb24eb9f39f652f89fac1bcc670fa40d8c6d2ecd3c340a81a212361d3
MD5 68bb58a5d03d1c7cdc4b3faf0b501730
BLAKE2b-256 29413948b8c094dcd75663f80177382fa4d55c072f6302115f1fefad08989640

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 44d60ad4b79ff21583f4efebe0feefbb86401108e8991affd78f0c52227fa48f
MD5 c018b816d6dec776268fee5974fbfdb8
BLAKE2b-256 3d4a146b0b73ed7d3bbb26fff652f59e9ae12fe201f8145e462f204a3e35d9e5

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b16cb9b57abe2e75dbe05a1295daafcb7495388f3ad783cc413ed870091409e9
MD5 0f16a50ad27129e1511084c526d9b72a
BLAKE2b-256 7614cdc9f86b2265eff3a2565f5bd1e4af512ad251994c2a36b1e1fa6229f853

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.11-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.11-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 8e6b209d57118ac305b6f13754d5cabe47e027b541d4ab55f9531285c61c37cf
MD5 350e09e5fc1ce40d71638f65a88aef96
BLAKE2b-256 c89d9c8fa6700e83ed76b11dab6be7b8c3e9ccb8992adc55c6a5a3ee35842809

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page