Skip to main content

High-performance graph analysis and pattern mining extension for Polars

Project description

PolarsGrouper

PolarsGrouper is a Rust-based extension for Polars that provides efficient graph analysis capabilities, with a focus on component grouping and network analysis.

Core Features

Component Grouping

  • super_merger: Easy-to-use wrapper for grouping connected components
  • super_merger_weighted: Component grouping with weight thresholds
  • Efficient implementation using Rust and Polars
  • Works with both eager and lazy Polars DataFrames

Additional Graph Analytics

  • Shortest Path Analysis: Find shortest paths between nodes
  • PageRank: Calculate node importance scores
  • Betweenness Centrality: Identify key bridge nodes
  • Association Rules: Discover item relationships and patterns

Installation

pip install polars-grouper

# For development:
python -m venv .venv
source .venv/bin/activate
maturin develop

Usage Examples

Basic Component Grouping

The core functionality uses super_merger to identify connected components:

import polars as pl
from polars_grouper import super_merger

df = pl.DataFrame({
    "from": ["A", "B", "C", "D", "E", "F"],
    "to": ["B", "C", "A", "E", "F", "D"],
    "value": [1, 2, 3, 4, 5, 6]
})

result = super_merger(df, "from", "to")
print(result)

Weighted Component Grouping

For cases where edge weights matter:

from polars_grouper import super_merger_weighted

df = pl.DataFrame({
    "from": ["A", "B", "C", "D", "E"],
    "to": ["B", "C", "D", "E", "A"],
    "weight": [0.9, 0.2, 0.05, 0.8, 0.3]
})

result = super_merger_weighted(
    df, 
    "from", 
    "to", 
    "weight",
    weight_threshold=0.3
)
print(result)

Additional Graph Analytics

Shortest Path Analysis

Find shortest paths between nodes:

from polars_grouper import calculate_shortest_path

df = pl.DataFrame({
    "from": ["A", "A", "B", "C"],
    "to": ["B", "C", "C", "D"],
    "weight": [1.0, 2.0, 1.0, 1.5]
})

paths = df.select(
    calculate_shortest_path(
        pl.col("from"),
        pl.col("to"),
        pl.col("weight"),
        directed=False
    ).alias("paths")
).unnest("paths")

PageRank Calculation

Calculate node importance:

from polars_grouper import page_rank

df = pl.DataFrame({
    "from": ["A", "A", "B", "C", "D"],
    "to": ["B", "C", "C", "A", "B"]
})

rankings = df.select(
    page_rank(
        pl.col("from"),
        pl.col("to"),
        damping_factor=0.85
    ).alias("pagerank")
).unnest("pagerank")

Association Rule Mining

Discover item relationships:

from polars_grouper import graph_association_rules

transactions = pl.DataFrame({
    "transaction_id": [1, 1, 1, 2, 2, 3],
    "item_id": ["A", "B", "C", "B", "D", "A"],
    "frequency": [1, 2, 1, 1, 1, 1]
})

rules = transactions.select(
    graph_association_rules(
        pl.col("transaction_id"),
        pl.col("item_id"),
        pl.col("frequency"),
        min_support=0.1
    ).alias("rules")
).unnest("rules")

Betweenness Centrality

Identify bridge nodes:

from polars_grouper import betweenness_centrality

df = pl.DataFrame({
    "from": ["A", "A", "B", "C", "D", "E"],
    "to": ["B", "C", "C", "D", "E", "A"]
})

centrality = df.select(
    betweenness_centrality(
        pl.col("from"),
        pl.col("to"),
        normalized=True
    ).alias("centrality")
).unnest("centrality")

Performance

The library is implemented in Rust for high performance:

  • Efficient memory usage
  • Fast computation for large graphs
  • Seamless integration with Polars' lazy evaluation

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_grouper-0.3.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distributions

polars_grouper-0.3.0-cp38-abi3-win_amd64.whl (3.5 MB view details)

Uploaded CPython 3.8+ Windows x86-64

polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ x86-64

polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl (4.4 MB view details)

Uploaded CPython 3.8+ manylinux: glibc 2.17+ i686

polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl (3.4 MB view details)

Uploaded CPython 3.8+ macOS 11.0+ ARM64

polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl (3.6 MB view details)

Uploaded CPython 3.8+ macOS 10.12+ x86-64

File details

Details for the file polars_grouper-0.3.0.tar.gz.

File metadata

  • Download URL: polars_grouper-0.3.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.4

File hashes

Hashes for polars_grouper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 76707a74ab55cca25b1c5066a293a29ae48baac3cc0db152983ed5230feeb622
MD5 f1a992fddbe2c2f83929a0dd6abccb04
BLAKE2b-256 98b5a5bc4de4f288f34483cae6a98ea65a26bfee34f15f2c381ecae52e345d03

See more details on using hashes here.

File details

Details for the file polars_grouper-0.3.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_grouper-0.3.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 1fc5e028c0bb1c2e3e5d18d4357da3b06502cfcfd14061d23e15520dcf7caa5e
MD5 73ef96b7d906d01585ab1e0cf0ba3ff3
BLAKE2b-256 e2c66515349d6d39ea395a04d45c91cbba599cff993cf8991e5aed3c5f9b255a

See more details on using hashes here.

File details

Details for the file polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 447974f42782c9998a49d70e82caf124c39c69094a5d845d4d3778e084409ebc
MD5 a493241e0a154cb8c5b00c599c08a7be
BLAKE2b-256 59a7a76c6f6ac2bdc8ed3f45c98886d00f6eb661f5ba6408268e0ac06f3cbae5

See more details on using hashes here.

File details

Details for the file polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for polars_grouper-0.3.0-cp38-abi3-manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 29cb97892720b464a9109c31229a4df086665cfcad7390a90c0417fcbfb0b9fd
MD5 1b12186f0f62c4351877289d70ae45cc
BLAKE2b-256 6500dac0055ce0aec0eb7fc6ac7807690ecad84fff23b27e29b0ad48ebd22b14

See more details on using hashes here.

File details

Details for the file polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_grouper-0.3.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3701fea159f2104d78e8aaad65c2af698275a8b8aa036a8c1d98ef18de06a822
MD5 30b7e09809e6df11dbbcd3e491485655
BLAKE2b-256 bf741d5e452b71b7e08615adf3a15d799f593d5cf16382e2bded1d9c50566b8f

See more details on using hashes here.

File details

Details for the file polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_grouper-0.3.0-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6a2c56eb4621502447268c2d40bfc7696fe291691fe777b257cdda869bfbdde2
MD5 7b9ea57acdff57566132dd16367eba8e
BLAKE2b-256 668481fd9b5a35668684cfd84b9902283d25aa83857acbaf88237ccaeae24819

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page