Sequence Graph Transform (SGT) for Polars - Transform sequential data into weighted n-gram representations
Project description
polars-sgt
Sequence Graph Transform for Polars
Transform sequential data into powerful n-gram representations with Polars.
polars-sgt brings Sequence Graph Transform (SGT) to Polars, enabling you to:
- ✅ Transform sequences into weighted n-gram features
- ✅ Grouped Analysis: Apply SGT across subsets (e.g., by direction, metric) and merge into a single wide DataFrame
- ✅ Billion-Row Scale: Optimized Rust implementation with O(1) time weight lookups
- ✅ Temporal Dynamics: Capture patterns with multiple decay functions across all n-gram transitions
- ✅ Flexible: Support for datetime, date, duration, and numeric time columns
- ✅ Lazy & Parallel: Fully compatible with Polars lazy evaluation and Rayon-backed parallel processing
What is SGT?
Sequence Graph Transform converts sequential data (like user clickstreams, sensor readings, or transaction histories) into weighted n-gram representations. Unlike traditional n-grams, SGT captures:
- Sequential patterns: Multi-transition dependencies (Unigrams, bigrams, trigrams...)
- Temporal dynamics: Weights decay based on time gaps between events.
- Normalized features: L1/L2 normalization for machine-learning-ready feature spaces.
Performance at Scale
Optimized for processing billions of rows:
- O(1) Weight Calculation: Uses cumulative product prefix arrays to calculate multi-transition time weights in constant time.
- Zero-Cost Abstraction: Written in Rust with Rayon for automatic multi-core utilization.
- Memory Efficient: Leverages Polars' arrow-backed memory management.
Installation
pip install polars-sgt
Quick Start
1. High-Level API: sgt_transform_df
The sgt_transform_df function is the easiest way to generate SGT features. It handles unnesting, exploding, and pivoting into a wide format automatically.
Single Group (Default)
import polars as pl
import polars_sgt as sgt
df = pl.DataFrame({
"user_id": ["A", "A", "A", "B", "B"],
"action": ["login", "view", "purchase", "login", "view"],
"time": [1, 2, 10, 1, 5],
})
# Generate wide-format features merged into one DataFrame
features = sgt.sgt_transform_df(
df,
sequence_id_col="user_id",
state_col="action",
time_col="time",
kappa=2
)
Grouped Sequence Analysis
Calculate separate SGT features for different groups (e.g., event types or directions) and merge them into one wide DataFrame.
# Calculate SGT features for each 'direction' and 'metric'
result = sgt.sgt_transform_df(
df,
sequence_id_col="user_id",
state_col="action",
time_col="time",
group_cols=["direction", "metric"],
kappa=3,
time_penalty="exponential",
alpha=0.7,
group_name="analysis"
)
# Columns: ['user_id', 'analysis-buy-p_login', 'analysis-sell-p_login', ...]
2. Expression API: sgt_transform
For more control or integration into complex pipelines, use the expression-based API.
# Basic expression usage (returns a struct)
result = df.select(
sgt.sgt_transform(
"user_id",
"action",
time_col="time",
kappa=2,
time_penalty="exponential",
alpha=0.1,
mode="l1"
).alias("sgt_features")
)
# Extract and explode
features = result.select([
pl.col("sgt_features").struct.field("sequence_id"),
pl.col("sgt_features").struct.field("ngram_keys").alias("ngrams"),
pl.col("sgt_features").struct.field("value").alias("weights"),
]).explode(["ngrams", "weights"])
With DateTime Columns
from datetime import datetime
df = pl.DataFrame({
"session_id": ["A", "A", "A", "A"],
"event": ["start", "click", "scroll", "exit"],
"time": [
datetime(2024, 1, 1, 10, 0),
datetime(2024, 1, 1, 10, 5),
datetime(2024, 1, 1, 10, 7),
datetime(2024, 1, 1, 10, 15),
],
})
result = df.select(
sgt.sgt_transform(
"session_id",
"event",
time_col="time",
deltatime="m", # unit: minutes
kappa=3,
)
)
Lazy Evaluation & Streaming
result = (
pl.scan_csv("large_sequences.csv")
.with_columns(pl.col("timestamp").str.to_datetime())
.select(
sgt.sgt_transform(
"user_id",
"action",
time_col="timestamp",
kappa=2,
deltatime="h",
)
)
.collect(engine="streaming")
)
API Reference
sgt.sgt_transform_df
The recommended high-level entry point. Returns a wide-format DataFrame.
df: Input DataFrame or LazyFrame.sequence_id_col: Column(s) identifying sequences.state_col: Column containing states/events.time_col: Optional timestamp column.group_cols: Optional column(s) to group by before SGT.kappa: Maximum n-gram size.mode: Normalization ("l1","l2","none").time_penalty: Decay function ("inverse","exponential","linear","power","none").
sgt.sgt_transform (Expression)
Returns a struct with sequence_id, ngram_keys, and value.
df.select(
sgt.sgt_transform("user", "action", kappa=2).alias("sgt")
).unnest("sgt")
Author & Acknowledgments
Author: Zedd (lytran14789@gmail.com)
Special Thanks: Built upon polars-xdt by Marco Gorelli.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_sgt-0.3.0.tar.gz.
File metadata
- Download URL: polars_sgt-0.3.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebbbe2e07b25c7a2bfde8671c9d12d6db8ee4f367077ea134bc4e2b5f499ee1d
|
|
| MD5 |
cbe8c879ef63f5123ad5d4b271b6d920
|
|
| BLAKE2b-256 |
8f0cb104a3aa2ec355bfc6ffdfbe9232fa3fdccac4cfefa96c69dad648871568
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0.tar.gz:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0.tar.gz -
Subject digest:
ebbbe2e07b25c7a2bfde8671c9d12d6db8ee4f367077ea134bc4e2b5f499ee1d - Sigstore transparency entry: 910773526
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 6.5 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a38595a9255d7424edc7513a29ecb33fa54e6168caa43428886d6260ec1f9acc
|
|
| MD5 |
e2425ca5fe577b32423fb1e749ed4ae1
|
|
| BLAKE2b-256 |
a2127e6c7f889ccf0d5995530a3d950750940405729288eb16c40ad247ce107c
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
a38595a9255d7424edc7513a29ecb33fa54e6168caa43428886d6260ec1f9acc - Sigstore transparency entry: 910773531
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 6.5 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2563e4be54defdb5a9f0ecea5dafa1bfe4f3b87b03801c498cd17553f5b086f
|
|
| MD5 |
4f71e518a18ddc6381ce5caa28dfc810
|
|
| BLAKE2b-256 |
0e02f157bae9e5147c873e8b86da217d24f43ddc8e1a718d849839a40b60acd9
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
c2563e4be54defdb5a9f0ecea5dafa1bfe4f3b87b03801c498cd17553f5b086f - Sigstore transparency entry: 910773584
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 6.2 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdc4f66103ad9e0fc9f73729ac84b1a1ad2076e4f9cc08dd6afcec866efb5359
|
|
| MD5 |
9a7c799d10f65129e1666c845fb85f17
|
|
| BLAKE2b-256 |
348dc72d6f891b4da2b29c351fd7957d0296e97598444bbaff606e282c993582
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-cp39-abi3-win_amd64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-cp39-abi3-win_amd64.whl -
Subject digest:
bdc4f66103ad9e0fc9f73729ac84b1a1ad2076e4f9cc08dd6afcec866efb5359 - Sigstore transparency entry: 910773545
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.1 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8add2a7543a75782da1ac4adc37657f0a9c95409720f098b126a62d3b1dc828
|
|
| MD5 |
3a33b51b3101294cf81834d8e865ccb5
|
|
| BLAKE2b-256 |
66183f9720b13c14a22e79ad5ca4cff4065aa34f3abb1a048eab25e5f9d8701c
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
f8add2a7543a75782da1ac4adc37657f0a9c95409720f098b126a62d3b1dc828 - Sigstore transparency entry: 910773571
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 6.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df0ecef6cbe0ee95a5edb78f020e2a2e08757195ae51a21f6545570c491ecf1f
|
|
| MD5 |
92d5404bbdfeac3b7c80a3e5785ed488
|
|
| BLAKE2b-256 |
fc19ebe82d31939342ae7aeb25cdd374fe9008104db57e1e9b6b9116cedbcc9a
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
df0ecef6cbe0ee95a5edb78f020e2a2e08757195ae51a21f6545570c491ecf1f - Sigstore transparency entry: 910773551
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4c33be6c0d490613a3b6713dbac2e109a706e4d5d2b4678032c90469a964552
|
|
| MD5 |
f8d70bb16fde175d533bd087f0d63eea
|
|
| BLAKE2b-256 |
4ec19b8f271c98708d61ce0005009bf619fe0c3f004d747b23d5cb95cead65e5
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
b4c33be6c0d490613a3b6713dbac2e109a706e4d5d2b4678032c90469a964552 - Sigstore transparency entry: 910773565
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: polars_sgt-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 6.2 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7d1a31ea35dd0e7be6af620589e65665335c8d429aba7a0970a4ed6ed3ecdcf
|
|
| MD5 |
e805b7459521f607d6f1222c854d9173
|
|
| BLAKE2b-256 |
bef41a60e9c64ca4363c6f19cf3978a0ecda30919a5cd909ff56421d432255d4
|
Provenance
The following attestation bundles were made for polars_sgt-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.3.0-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
b7d1a31ea35dd0e7be6af620589e65665335c8d429aba7a0970a4ed6ed3ecdcf - Sigstore transparency entry: 910773578
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@5c2125822ee771cc6586ea4129af8ccb7b356f66 -
Trigger Event:
push
-
Statement type: