No project description provided
Project description
polars-ml
Machine Learning Polars Plugin
Getting Started
Install from Pypi:
pip install polars-ml
Examples
Graph Namespace
import polars as pl
import polars_ml as plm
df = pl.DataFrame({
'src_node': ['V1', 'V2', 'V3'],
'neighbors': [['V2', 'V4'], ['V3'], ['V1']],
'weights': [[1.0, 2.0], [0.5], [3.5]]
})
embedding_df = df.with_columns(
plm.graph.node2vec(source_node=pl.col('src_node'),
neighbors=pl.col('neighbors'),
weights=pl.col('weights'),
is_directed=False,
p=1.0,
q=1.0,
max_neighbors=50,
embedding_size=64,
random_state=42,
verbose=True).alias('embedding')
).select('src_node', 'embedding')
print(embedding_df)
shape: (3, 2)
┌──────────┬───────────────────────────────────┐
│ src_node ┆ embedding │
│ --- ┆ --- │
│ str ┆ list[f32] │
╞══════════╪═══════════════════════════════════╡
│ V1 ┆ [0.521827, -0.314611, … -0.16515… │
│ V2 ┆ [0.335624, -0.041853, … 0.224424… │
│ V3 ┆ [0.274431, -0.210741, … -0.02325… │
└──────────┴───────────────────────────────────┘
Nltk Namespace
import polars as pl
import polars_ml as plm
df = pl.DataFrame({
'words': ['the', 'bull', 'is', 'running', 'away']
})
df_stemmed = df.with_columns(
plm.nltk.snowball_stem(pl.col('words'), language='english')
)
print(df_stemmed)
shape: (5, 1)
┌───────┐
│ words │
│ --- │
│ str │
╞═══════╡
│ the │
│ bull │
│ is │
│ run │
│ away │
└───────┘
Sparse Namespace
import polars as pl
import polars_ml.sparse as ps
df = pl.DataFrame({
'feature': [
[0, 1, 0, 0, 5, 0],
[2, 0, 0, 0, 3, 4],
[0, 1],
None
]
})
df_sparse = df.with_columns(
ps.from_list(pl.col('feature')).alias('sparse_feature')
)
print(df_sparse)
shape: (4, 2)
┌─────────────┬─────────────────────────┐
│ feature ┆ sparse_feature │
│ --- ┆ --- │
│ list[i64] ┆ struct[3] │
╞═════════════╪═════════════════════════╡
│ [0, 1, … 0] ┆ {6,[1, 4],[1, 5]} │
│ [2, 0, … 4] ┆ {6,[0, 4, 5],[2, 3, 4]} │
│ [0, 1] ┆ {2,[1],[1]} │
│ null ┆ {null,null,null} │
└─────────────┴─────────────────────────┘
df_sparse_norm = df_sparse.select('sparse_feature') \
.with_columns(ps.normalize(pl.col('sparse_feature'), how='vertical', p=2.0).alias('sparse_feature_norm'))
print(df_sparse_norm)
shape: (4, 2)
┌─────────────────────────┬───────────────────────────────────┐
│ sparse_feature ┆ sparse_feature_norm │
│ --- ┆ --- │
│ struct[3] ┆ struct[3] │
╞═════════════════════════╪═══════════════════════════════════╡
│ {6,[1, 4],[1, 5]} ┆ {6,[1, 4],[0.707107, 0.857493]} │
│ {6,[0, 4, 5],[2, 3, 4]} ┆ {6,[0, 4, 5],[1.0, 0.514496, 1.0… │
│ {2,[1],[1]} ┆ {2,[1],[0.707107]} │
│ {null,null,null} ┆ {null,null,null} │
└─────────────────────────┴───────────────────────────────────┘
Credits
- GRAPE for fast and scalable graph processing and random-walk-based embedding. See article here and library here.
- Rust Snowball Stemmer is taken from Tsoding's Seroost project (MIT). See here.
- Marco Edward Gorelli - for using his polars plugin tutorial.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
File details
Details for the file polars_ml-0.2.0-cp38-abi3-win_amd64.whl
.
File metadata
- Download URL: polars_ml-0.2.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cca87aeeaf149e2fa1998b56be094570cff73fd8020b98c28cde47b54ad867a |
|
MD5 | 29b1aabb78a60141a0b6ecc3f7c66ea3 |
|
BLAKE2b-256 | 5f2cd7235ee4f5a354d43cddc2c4e75950f250822e6cda881723c80cd19b1036 |
File details
Details for the file polars_ml-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: polars_ml-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 5.4 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b01eaa4dc61e48e8f29f5423625d36cd129fe3b900fc3e18150e8ce8b46b669 |
|
MD5 | 7003e26e18a54ffdf84d8c9d350a9d92 |
|
BLAKE2b-256 | 3eb3307422e8b2a5333142aa97f9bde27c0ed73ffc8f2a441f5d84fa1d44b909 |