Skip to main content

No project description provided

Project description

polars-ml

Machine Learning Polars Plugin

PyPI version

Getting Started

Install from Pypi:

pip install polars-ml

Examples

Graph Namespace

import polars as pl
import polars_ml as plm

df = pl.DataFrame({
    'src_node': ['V1', 'V2', 'V3'],
    'neighbors': [['V2', 'V4'], ['V3'], ['V1']],
    'weights': [[1.0, 2.0], [0.5], [3.5]]
})

embedding_df = df.with_columns(
    plm.graph.node2vec(source_node=pl.col('src_node'),
                       neighbors=pl.col('neighbors'),
                       weights=pl.col('weights'),
                       is_directed=False,
                       p=1.0,
                       q=1.0,
                       max_neighbors=50,
                       embedding_size=64,
                       random_state=42,
                       verbose=True).alias('embedding')
).select('src_node', 'embedding')

print(embedding_df)
shape: (3, 2)
┌──────────┬───────────────────────────────────┐
│ src_node ┆ embedding                         │
│ ---      ┆ ---                               │
│ str      ┆ list[f32]                         │
╞══════════╪═══════════════════════════════════╡
│ V1       ┆ [0.521827, -0.314611, … -0.16515… │
│ V2       ┆ [0.335624, -0.041853, … 0.224424… │
│ V3       ┆ [0.274431, -0.210741, … -0.02325… │
└──────────┴───────────────────────────────────┘

Nltk Namespace

import polars as pl
import polars_ml as plm


df = pl.DataFrame({
    'words': ['the', 'bull', 'is', 'running', 'away']
})

df_stemmed = df.with_columns(
    plm.nltk.snowball_stem(pl.col('words'), language='english')
)

print(df_stemmed)
shape: (5, 1)
┌───────┐
│ words │
│ ---   │
│ str   │
╞═══════╡
│ the   │
│ bull  │
│ is    │
│ run   │
│ away  │
└───────┘

Sparse Namespace

import polars as pl
import polars_ml.sparse as ps


df = pl.DataFrame({
    'feature': [
        [0, 1, 0, 0, 5, 0],
        [2, 0, 0, 0, 3, 4],
        [0, 1],
        None
    ]
})

df_sparse = df.with_columns(
   ps.from_list(pl.col('feature')).alias('sparse_feature')
)

print(df_sparse)
shape: (4, 2)
┌─────────────┬─────────────────────────┐
│ feature     ┆ sparse_feature          │
│ ---         ┆ ---                     │
│ list[i64]   ┆ struct[3]               │
╞═════════════╪═════════════════════════╡
│ [0, 1, … 0] ┆ {6,[1, 4],[1, 5]}       │
│ [2, 0, … 4] ┆ {6,[0, 4, 5],[2, 3, 4]} │
│ [0, 1]      ┆ {2,[1],[1]}             │
│ null        ┆ {null,null,null}        │
└─────────────┴─────────────────────────┘
df_sparse_norm = df_sparse.select('sparse_feature') \
    .with_columns(ps.normalize(pl.col('sparse_feature'), how='vertical', p=2.0).alias('sparse_feature_norm'))
print(df_sparse_norm)
shape: (4, 2)
┌─────────────────────────┬───────────────────────────────────┐
│ sparse_feature          ┆ sparse_feature_norm               │
│ ---                     ┆ ---                               │
│ struct[3]               ┆ struct[3]                         │
╞═════════════════════════╪═══════════════════════════════════╡
│ {6,[1, 4],[1, 5]}       ┆ {6,[1, 4],[0.707107, 0.857493]}   │
│ {6,[0, 4, 5],[2, 3, 4]} ┆ {6,[0, 4, 5],[1.0, 0.514496, 1.0… │
│ {2,[1],[1]}             ┆ {2,[1],[0.707107]}                │
│ {null,null,null}        ┆ {null,null,null}                  │
└─────────────────────────┴───────────────────────────────────┘

Credits

  1. GRAPE for fast and scalable graph processing and random-walk-based embedding. See article here and library here.
  2. Rust Snowball Stemmer is taken from Tsoding's Seroost project (MIT). See here.
  3. Marco Edward Gorelli - for using his polars plugin tutorial.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

polars_ml-0.2.0-cp38-abi3-win_amd64.whl (4.1 MB view details)

Uploaded CPython 3.8+ Windows x86-64

polars_ml-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

File details

Details for the file polars_ml-0.2.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_ml-0.2.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for polars_ml-0.2.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8cca87aeeaf149e2fa1998b56be094570cff73fd8020b98c28cde47b54ad867a
MD5 29b1aabb78a60141a0b6ecc3f7c66ea3
BLAKE2b-256 5f2cd7235ee4f5a354d43cddc2c4e75950f250822e6cda881723c80cd19b1036

See more details on using hashes here.

File details

Details for the file polars_ml-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_ml-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6b01eaa4dc61e48e8f29f5423625d36cd129fe3b900fc3e18150e8ce8b46b669
MD5 7003e26e18a54ffdf84d8c9d350a9d92
BLAKE2b-256 3eb3307422e8b2a5333142aa97f9bde27c0ed73ffc8f2a441f5d84fa1d44b909

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page