Sequence Graph Transform (SGT) for Polars - Transform sequential data into weighted n-gram representations
Project description
polars-sgt
Sequence Graph Transform for Polars
Transform sequential data into powerful n-gram representations with Polars.
polars-sgt brings Sequence Graph Transform (SGT) to Polars, enabling you to:
- ✅ Transform sequences into weighted n-gram features
- ✅ Capture temporal patterns with time-based weighting
- ✅ Apply flexible normalization strategies (L1, L2, or none)
- ✅ Handle datetime, date, duration, and numeric time columns
- ✅ Blazingly fast, written in Rust
- ✅ Compatible with Polars lazy evaluation and streaming
What is SGT?
Sequence Graph Transform converts sequential data (like user clickstreams, sensor readings, or transaction histories) into weighted n-gram representations. It captures:
- Sequential patterns: Unigrams, bigrams, trigrams, and higher-order n-grams
- Temporal dynamics: Time-based weighting with multiple decay functions
- Normalized features: L1/L2 normalization for comparable feature spaces
Perfect for:
- User behavior analysis
- Time series feature engineering
- Sequential pattern mining
- Anomaly detection in sequences
Installation
Then install polars-sgt:
pip install polars-sgt
Quick Start
Basic Example
import polars as pl
import polars_sgt as sgt
# User clickstream data
df = pl.DataFrame({
"user_id": [1, 1, 1, 2, 2, 2],
"action": ["login", "view_product", "purchase", "login", "view_product", "logout"],
"timestamp": [0, 10, 20, 0, 5, 15],
})
# Generate bigrams with exponential time decay
result = df.select(
sgt.sgt_transform(
"user_id",
"action",
time_col="timestamp",
kappa=2, # bigrams
time_penalty="exponential",
alpha=0.1,
mode="l1" # L1 normalization
).alias("sgt_features")
)
# Extract features
features = result.select([
pl.col("sgt_features").struct.field("sequence_id"),
pl.col("sgt_features").struct.field("ngram_keys").alias("ngrams"),
pl.col("sgt_features").struct.field("value").alias("weights"),
]).explode(["ngrams", "weights"])
print(features)
#OR
result = df.select(
sgt.sgt_transform(
"session_id",
"event",
time_col="time",
deltatime="m", # minutes
kappa=3, # trigrams
time_penalty="inverse",
mode="l2",
alpha=0.5
).alias("struct_type")
)
out = (
result
.unnest("struct_type")
.explode(["ngram_keys", "value"])
.filter(pl.col("ngram_keys").str.split("->").list.len() > 0)
)
With DateTime Columns
from datetime import datetime
df = pl.DataFrame({
"session_id": ["A", "A", "A", "A"],
"event": ["start", "click", "scroll", "exit"],
"time": [
datetime(2024, 1, 1, 10, 0),
datetime(2024, 1, 1, 10, 5),
datetime(2024, 1, 1, 10, 7),
datetime(2024, 1, 1, 10, 15),
],
})
result = df.select(
sgt.sgt_transform(
"session_id",
"event",
time_col="time",
deltatime="m", # minutes
kappa=3, # trigrams
time_penalty="inverse",
)
)
Lazy Evaluation & Streaming
result = (
pl.scan_csv("large_sequences.csv")
.with_columns(pl.col("timestamp").str.to_datetime())
.select(
sgt.sgt_transform(
"user_id",
"action",
time_col="timestamp",
kappa=2,
deltatime="h",
)
)
.collect(streaming=True)
)
Parameters
Required
sequence_id_col: Column with sequence identifiers (groups)state_col: Column with state/event values
Optional
-
time_col: Timestamp column (datetime, date, duration, or numeric) -
kappa: Maximum n-gram size (default: 1)- 1 = unigrams only
- 2 = unigrams + bigrams
- 3 = unigrams + bigrams + trigrams, etc.
-
time_penalty: Time decay function (default: "inverse")"inverse": weight = alpha / time_diff"exponential": weight = exp(-alpha × time_diff)"linear": weight = max(0, 1 - alpha × time_diff)"power": weight = 1 / time_diff^beta"none": No time penalty
-
mode: Normalization mode (default: "l1")"l1": Sum of weights = 1"l2": L2 norm = 1"none": No normalization
-
length_sensitive: Apply length normalization (default: False) -
alpha: Time penalty scale parameter (default: 1.0) -
beta: Power parameter for "power" penalty (default: 2.0) -
deltatime: Time unit for datetime columns"s","m","h","d","w","month","q","y"
Output
Returns a Struct with three fields:
sequence_id: Original sequence identifierngram_keys: List of n-gram strings (e.g., "login -> view -> purchase")value: List of corresponding weights
Additional DateTime Utilities
While SGT is the primary focus, polars-sgt also includes helpful datetime utilities from the original polars-xdt:
- Timezone conversions
- Localized date formatting
- Julian date conversion
- Month delta calculations
See the full API documentation for details.
Author & Acknowledgments
Author: Zedd (lytran14789@gmail.com)
Special Thanks: This project is built upon polars-xdt created by Marco Gorelli. We are grateful for his excellent foundation.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_sgt-0.2.5.tar.gz.
File metadata
- Download URL: polars_sgt-0.2.5.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8fa3f6d2938754a948e6244b0be2fef245b224904839d0face1e310e32bc332
|
|
| MD5 |
ef3b13bf38bc4b682b4d79dcb242fa36
|
|
| BLAKE2b-256 |
1c9541acae23dddf33fb15f71252946f2ab61231d7d64c1d5b5e5a14d0eaff56
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5.tar.gz:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5.tar.gz -
Subject digest:
e8fa3f6d2938754a948e6244b0be2fef245b224904839d0face1e310e32bc332 - Sigstore transparency entry: 910427165
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 6.5 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed847cc3973379030b5975a92c4c6062f6e5251f364f4c447eb4b7eefaec6e70
|
|
| MD5 |
2e8f06558441c998b72740e5b4909927
|
|
| BLAKE2b-256 |
889b546d189f6d4e315a1dc2b44867e327fd0dbdab3cb7c102493c9d8e4c89b3
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
ed847cc3973379030b5975a92c4c6062f6e5251f364f4c447eb4b7eefaec6e70 - Sigstore transparency entry: 910427228
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 6.5 MB
- Tags: PyPy, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9336e466199b2070eef0548bdf27b42cc9089f58ffe7ebef7d5b4ee2bc408a40
|
|
| MD5 |
edb97c482bdcd45374ce2b27ee907aae
|
|
| BLAKE2b-256 |
659760db19482aa2f3b05d6a0fcc5bfb16167e9eb863f3c98805f0870c1dcb22
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
9336e466199b2070eef0548bdf27b42cc9089f58ffe7ebef7d5b4ee2bc408a40 - Sigstore transparency entry: 910427256
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 6.2 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b30ac87518158f9c5124ea81b6c1e9ef5130acc0b1a211a542df770da29e18f
|
|
| MD5 |
164a6c753debe3a4e86da4db5ce267eb
|
|
| BLAKE2b-256 |
9117b683ca757998a45308400266ffaafdb4e1908116092f6bf4e8989e31506e
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-win_amd64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-cp39-abi3-win_amd64.whl -
Subject digest:
8b30ac87518158f9c5124ea81b6c1e9ef5130acc0b1a211a542df770da29e18f - Sigstore transparency entry: 910427212
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.1 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd8df211862b5f8651c1743d70a7835bcab995455974992683feb7e8e9514686
|
|
| MD5 |
289824ca55ad0d7f29256ee947921ee5
|
|
| BLAKE2b-256 |
08ba1c3c2205c7e7cc8a56fc958363458a0eb72d124a1a74f4194e7dd30d9bfd
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
bd8df211862b5f8651c1743d70a7835bcab995455974992683feb7e8e9514686 - Sigstore transparency entry: 910427194
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 6.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a535556d3746dd1864a117a2fc0c6debe7808f46ca3edae633a3750cc36bf37
|
|
| MD5 |
fbde8cfa4bd58998cb05f8357d53dbc9
|
|
| BLAKE2b-256 |
7a3a41eee15aab56757cdb239fbb39083948108027f66fd0d708b3705fd47229
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
8a535556d3746dd1864a117a2fc0c6debe7808f46ca3edae633a3750cc36bf37 - Sigstore transparency entry: 910427243
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 5.9 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eea2e81bbd0e5e701519ea65bc7422ed655cae483251434dae3be03a356096b6
|
|
| MD5 |
9e4b16fd4f0429f88b229ae169e3bd25
|
|
| BLAKE2b-256 |
531343ad6b02faed3c877421e05f4e68f98bf2fc0ac247633f80047526d4cc6f
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
eea2e81bbd0e5e701519ea65bc7422ed655cae483251434dae3be03a356096b6 - Sigstore transparency entry: 910427266
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 6.2 MB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53ed9c689919e0b39523507a3243cedc3dc72de5fc7d850f286e413e7d32e757
|
|
| MD5 |
9b3436563b2ae2d2aa9788d570aa9620
|
|
| BLAKE2b-256 |
f8c50f52c792791252181042f655e4ff1f1837f8d48968796ea7e68f96b38efd
|
Provenance
The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
CI.yml on 4ursmile/polars-sgt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
53ed9c689919e0b39523507a3243cedc3dc72de5fc7d850f286e413e7d32e757 - Sigstore transparency entry: 910427178
- Sigstore integration time:
-
Permalink:
4ursmile/polars-sgt@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Branch / Tag:
refs/tags/v0.2.5 - Owner: https://github.com/4ursmile
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
CI.yml@1bab5a9ed9897f77ab0354a3cde718a8d50219f1 -
Trigger Event:
push
-
Statement type: