Skip to main content

A Polars plugin for text embeddings in DataFrames

Project description

Polars FastEmbed

uv pdm-managed PyPI Supported Python versions License pre-commit.ci status

A Polars plugin for embedding DataFrames

Installation

uv pip install polars-fastembed

Or for backcompatibility with older CPUs:

uv pip install polars-fastembed[rtcompat]

Features

  • Embed from a DataFrame by specifying the source column(s)
  • Re-order/filter rows by semantic similarity to a query
  • Efficiently reuse loaded models via a global registry (no repeated model loads)

Demo

See demo.py

import polars as pl
from polars_fastembed import register_model

# Create a sample DataFrame
df = pl.DataFrame(
    {
        "id": [1, 2, 3],
        "text": [
            "Hello world",
            "Deep Learning is amazing",
            "Polars and FastEmbed are well integrated",
        ],
    }
)

model_id = "Xenova/bge-small-en-v1.5"

# 1) Register a model
#    Optionally specify GPU: providers=["CUDAExecutionProvider"]
#    Or omit it for CPU usage
register_model(model_id, providers=["CPUExecutionProvider"])

# 2) Embed your text
df_emb = df.fastembed.embed(
    columns="text",
    model_name=model_id,
    output_column="embedding",
)

# Inspect embeddings
print(df_emb)

# 3) Perform retrieval
result = df_emb.fastembed.retrieve(
    query="Tell me about deep learning",
    model_name=model_id,
    embedding_column="embedding",
    k=3,
)
print(result)
shape: (3, 3)
┌─────┬─────────────────────────────────┬─────────────────────────────────┐
│ id  ┆ text                            ┆ embedding                       │
│ --- ┆ ---                             ┆ ---                             │
│ i64 ┆ str                             ┆ array[f32, 384]                 │
╞═════╪═════════════════════════════════╪═════════════════════════════════╡
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… │
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… │
└─────┴─────────────────────────────────┴─────────────────────────────────┘
shape: (3, 4)
┌─────┬─────────────────────────────────┬─────────────────────────────────┬────────────┐
│ id  ┆ text                            ┆ embedding                       ┆ similarity │
│ --- ┆ ---                             ┆ ---                             ┆ ---        │
│ i64 ┆ str                             ┆ array[f32, 384]                 ┆ f64        │
╞═════╪═════════════════════════════════╪═════════════════════════════════╪════════════╡
│ 2   ┆ Deep Learning is amazing        ┆ [-0.016128, -0.018325, … -0.06… ┆ 0.825373   │
│ 3   ┆ Polars and FastEmbed are well … ┆ [-0.086584, 0.026477, … 0.0399… ┆ 0.543264   │
│ 1   ┆ Hello world                     ┆ [0.015196, -0.022571, … 0.0260… ┆ 0.52316    │
└─────┴─────────────────────────────────┴─────────────────────────────────┴────────────┘

Note:

  • This will download a 133 MB model to your working directory under .fastembed_cache
  • In the original version this was a 384-dimensional array of f64 and here it is a list of f32. This will become an array as well in future versions (watch this space).

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_fastembed-0.1.4.tar.gz (113.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_fastembed-0.1.4-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.4-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.4-cp38-abi3-win_amd64.whl (11.1 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_fastembed-0.1.4-cp38-abi3-manylinux_2_28_i686.whl (17.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ i686

polars_fastembed-0.1.4-cp38-abi3-manylinux_2_28_aarch64.whl (13.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

polars_fastembed-0.1.4-cp38-abi3-macosx_11_0_arm64.whl (9.7 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

polars_fastembed-0.1.4-cp38-abi3-macosx_10_12_x86_64.whl (11.4 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file polars_fastembed-0.1.4.tar.gz.

File metadata

  • Download URL: polars_fastembed-0.1.4.tar.gz
  • Upload date:
  • Size: 113.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.10.2

File hashes

Hashes for polars_fastembed-0.1.4.tar.gz
Algorithm Hash digest
SHA256 942cb2255e48b4d40cf77aba1345fa2069963b090b38be28d728f7bbceb4583e
MD5 f14c32ce9dd1b3cfc317f304d4c774e7
BLAKE2b-256 03ef93b39541ced6cde65c1f88d68c4d8b2f9c19a66ede3ba41ae7b0a2416239

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 79538d61617af9d8d56183c85fa1d9c92617d801a4b1dd7357343bfe22db0e39
MD5 754e138f99722ca6ed642d1218e89893
BLAKE2b-256 7defc1cdf8ea6aa0a159b49ec58dbf229a1552368bc2d97ed362a5bf3011a191

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 0c315f544f556c4bd1600f070d0825be0715d17fb2b4f4420bbc981f7bea6729
MD5 0fa90bb51604c16b0706b1ac0ff8df74
BLAKE2b-256 93e361711b55cb7761e660712b27a7e7e94d26b4c0ba6e3a65094d0c08f32ded

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 08392d2349727ff1266cb10375a0b8fdb44e968e6b0d84d362c269bc424e03c3
MD5 6fd9f1f7b79162a3195a460624367f52
BLAKE2b-256 2218f634a60edd699cc139d1b14938b9d73ba4d130c1ece3abff157852e7c40e

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-cp38-abi3-manylinux_2_28_i686.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-cp38-abi3-manylinux_2_28_i686.whl
Algorithm Hash digest
SHA256 8a58125b38f0a7be07880326fdf287ef49769ea5568d8c4c18a88a1c8a5c43cf
MD5 b11c27f2c4f16adc13a4ebb4e63c2575
BLAKE2b-256 590ab7891bf12b2469d3e51327ccb4b94507e297920e906bc73aa778e6d35823

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8a6ea2349081f0d1223aa7ab011ce049bdca99e1535f18c1bffd3cbdf1bb4997
MD5 e12433171786e2364747415665a27d50
BLAKE2b-256 2e902f7b1a590034e21a7854d8c3dac9ed8bb2323cbd08e01aa2287a49bb4055

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9c88676c3bf2a8a029f60186611e369d224cae1f4db9808a9b49a3f7fec63522
MD5 28691b73e1d5e3c6d90e9fe66e09e66f
BLAKE2b-256 9d07080ab6097059039635698d9aeea3de58ce193acaf1f94af3be5682cce0b8

See more details on using hashes here.

File details

Details for the file polars_fastembed-0.1.4-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_fastembed-0.1.4-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 0c110937b3246d4445861dc4dcda9d63cbd63207e8dfab83b292db6d38cb8839
MD5 b8988d32ba6bb27572282359e92de1b0
BLAKE2b-256 d6cda2f3c3c136caedfcc62414bc782863408b9ac894cd7a2b09a9a643eebd06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page