Skip to main content

A text embedding extension for the Polars Dataframe library.

Project description

polars-candle

A polars extension for running candle ML models on polars DataFrames.

Example

Pull any applicable model from Huggingface, such as the recently released Snowflake model, and embed text using a simple API.

import polars as pl
import polars_candle  # ignore: F401

df = pl.DataFrame({"s": ["This is a sentence", "This is another sentence"]})

df = df.with_columns(
    pl.col("s").candle.embed_text("Snowflake/snowflake-arctic-embed-xs").alias("s_embedding")
)
print(df)
# ┌──────────────────────────┬───────────────────────────────────┐
# │ s                        ┆ s_embedding                       │
# │ ---                      ┆ ---                               │
# │ str                      ┆ array[f32, 384]                   │
# ╞══════════════════════════╪═══════════════════════════════════╡
# │ This is a sentence       ┆ [-0.056457, 0.559411, … -0.20403… │
# │ This is another sentence ┆ [-0.117206, 0.336827, … 0.174078… │
# └──────────────────────────┴───────────────────────────────────┘

Currently, Bert, JinaBert, and Distilbert models are supported. More models will be added in the future. Check my other repository wdoppenberg/glowrs to learn more about the underlying implementation for sentence embedding.

Installation

Clone the repository and install the package using:

pip install .

Note: PyPI package is not available yet, will be in the future.

If you're on a Mac with an ARM processor, the library will install with Metal acceleration by default. Should you want more control over the installation, you can install the package using:

maturin develop --release -F <feature>

Where <feature> can be one of the following:

  • metal Install with Metal acceleration.
  • cuda Install with CUDA acceleration.
  • accelerate Install with the Accelerate framework.

Roadmap

  • Embed text using Bert, JinaBert, and Distilbert models.
  • Add more models.
  • More configuration options for embedding (e.g. pooling strategy, device selection, etc.).
  • Support & test streaming workloads.

Credits

  • Massive thanks to polars & their contributors for providing a blazing fast DataFrame library with the ability to extend it with custom functions using pyo3-polars.
  • Great work so far by Huggingface on candle for providing a simple interface to run ML models.

Note

This is a work in progress and the API might change in the future. Feel free to open an issue if you have any suggestions or improvements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

polars_candle-0.1.3-cp312-none-win_amd64.whl (6.3 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

polars_candle-0.1.3-cp312-cp312-manylinux_2_34_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.34+ x86-64

polars_candle-0.1.3-cp312-cp312-macosx_11_0_arm64.whl (5.8 MB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

polars_candle-0.1.3-cp311-none-win_amd64.whl (6.3 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

polars_candle-0.1.3-cp311-cp311-manylinux_2_34_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.34+ x86-64

polars_candle-0.1.3-cp311-cp311-macosx_11_0_arm64.whl (5.8 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

polars_candle-0.1.3-cp310-none-win_amd64.whl (6.3 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

polars_candle-0.1.3-cp310-cp310-manylinux_2_34_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.34+ x86-64

polars_candle-0.1.3-cp310-cp310-macosx_11_0_arm64.whl (5.8 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

polars_candle-0.1.3-cp39-none-win_amd64.whl (6.3 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

polars_candle-0.1.3-cp39-cp39-manylinux_2_34_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.34+ x86-64

polars_candle-0.1.3-cp39-cp39-macosx_11_0_arm64.whl (5.8 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page