Skip to main content

Sequence Graph Transform (SGT) for Polars - Transform sequential data into weighted n-gram representations

Project description

polars-sgt

Sequence Graph Transform for Polars

PyPI version

Transform sequential data into powerful n-gram representations with Polars.

polars-sgt brings Sequence Graph Transform (SGT) to Polars, enabling you to:

  • ✅ Transform sequences into weighted n-gram features
  • ✅ Capture temporal patterns with time-based weighting
  • ✅ Apply flexible normalization strategies (L1, L2, or none)
  • ✅ Handle datetime, date, duration, and numeric time columns
  • ✅ Blazingly fast, written in Rust
  • ✅ Compatible with Polars lazy evaluation and streaming

What is SGT?

Sequence Graph Transform converts sequential data (like user clickstreams, sensor readings, or transaction histories) into weighted n-gram representations. It captures:

  • Sequential patterns: Unigrams, bigrams, trigrams, and higher-order n-grams
  • Temporal dynamics: Time-based weighting with multiple decay functions
  • Normalized features: L1/L2 normalization for comparable feature spaces

Perfect for:

  • User behavior analysis
  • Time series feature engineering
  • Sequential pattern mining
  • Anomaly detection in sequences

Installation

Then install polars-sgt:

pip install polars-sgt

Quick Start

Basic Example

import polars as pl
import polars_sgt as sgt

# User clickstream data
df = pl.DataFrame({
    "user_id": [1, 1, 1, 2, 2, 2],
    "action": ["login", "view_product", "purchase", "login", "view_product", "logout"],
    "timestamp": [0, 10, 20, 0, 5, 15],
})

# Generate bigrams with exponential time decay
result = df.select(
    sgt.sgt_transform(
        "user_id",
        "action",
        time_col="timestamp",
        kappa=2,  # bigrams
        time_penalty="exponential",
        alpha=0.1,
        mode="l1"  # L1 normalization
    ).alias("sgt_features")
)

# Extract features
features = result.select([
    pl.col("sgt_features").struct.field("sequence_id"),
    pl.col("sgt_features").struct.field("ngram_keys").alias("ngrams"),
    pl.col("sgt_features").struct.field("value").alias("weights"),
]).explode(["ngrams", "weights"])

print(features)

#OR 
result = df.select(
    sgt.sgt_transform(
        "session_id",
        "event",
        time_col="time",
        deltatime="m",  # minutes
        kappa=3,  # trigrams
        time_penalty="inverse",
        mode="l2",
        alpha=0.5
    ).alias("struct_type")
)
out = (
    result
    .unnest("struct_type")
    .explode(["ngram_keys", "value"])
    .filter(pl.col("ngram_keys").str.split("->").list.len() > 0)
)

With DateTime Columns

from datetime import datetime

df = pl.DataFrame({
    "session_id": ["A", "A", "A", "A"],
    "event": ["start", "click", "scroll", "exit"],
    "time": [
        datetime(2024, 1, 1, 10, 0),
        datetime(2024, 1, 1, 10, 5),
        datetime(2024, 1, 1, 10, 7),
        datetime(2024, 1, 1, 10, 15),
    ],
})

result = df.select(
    sgt.sgt_transform(
        "session_id",
        "event",
        time_col="time",
        deltatime="m",  # minutes
        kappa=3,  # trigrams
        time_penalty="inverse",
    )
)

Lazy Evaluation & Streaming

result = (
    pl.scan_csv("large_sequences.csv")
    .with_columns(pl.col("timestamp").str.to_datetime())
    .select(
        sgt.sgt_transform(
            "user_id",
            "action",
            time_col="timestamp",
            kappa=2,
            deltatime="h",
        )
    )
    .collect(streaming=True)
)

Parameters

Required

  • sequence_id_col: Column with sequence identifiers (groups)
  • state_col: Column with state/event values

Optional

  • time_col: Timestamp column (datetime, date, duration, or numeric)

  • kappa: Maximum n-gram size (default: 1)

    • 1 = unigrams only
    • 2 = unigrams + bigrams
    • 3 = unigrams + bigrams + trigrams, etc.
  • time_penalty: Time decay function (default: "inverse")

    • "inverse": weight = alpha / time_diff
    • "exponential": weight = exp(-alpha × time_diff)
    • "linear": weight = max(0, 1 - alpha × time_diff)
    • "power": weight = 1 / time_diff^beta
    • "none": No time penalty
  • mode: Normalization mode (default: "l1")

    • "l1": Sum of weights = 1
    • "l2": L2 norm = 1
    • "none": No normalization
  • length_sensitive: Apply length normalization (default: False)

  • alpha: Time penalty scale parameter (default: 1.0)

  • beta: Power parameter for "power" penalty (default: 2.0)

  • deltatime: Time unit for datetime columns

    • "s", "m", "h", "d", "w", "month", "q", "y"

Output

Returns a Struct with three fields:

  • sequence_id: Original sequence identifier
  • ngram_keys: List of n-gram strings (e.g., "login -> view -> purchase")
  • value: List of corresponding weights

Additional DateTime Utilities

While SGT is the primary focus, polars-sgt also includes helpful datetime utilities from the original polars-xdt:

  • Timezone conversions
  • Localized date formatting
  • Julian date conversion
  • Month delta calculations

See the full API documentation for details.

Author & Acknowledgments

Author: Zedd (lytran14789@gmail.com)

Special Thanks: This project is built upon polars-xdt created by Marco Gorelli. We are grateful for his excellent foundation.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_sgt-0.2.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_sgt-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

polars_sgt-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

polars_sgt-0.2.0-cp39-abi3-win_amd64.whl (6.2 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

polars_sgt-0.2.0-cp39-abi3-macosx_11_0_arm64.whl (5.9 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_sgt-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_sgt-0.2.0.tar.gz.

File metadata

  • Download URL: polars_sgt-0.2.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_sgt-0.2.0.tar.gz
Algorithm Hash digest
SHA256 dc416eeb09983ce8a9a4973082960e375bc3b5365e57faa455970ede29af37e3
MD5 5f4fe12d048fd68ef5c275e429035b22
BLAKE2b-256 67a60a1e7f27ca9559b95c69dadc4ff07515e5d42a0d19097bd88e156851bd83

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0.tar.gz:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9c70420840790b10fa78cb6ea6a1fcb4885852dcbdf07368f11412dc917ec263
MD5 2cc6193cfebbb295ab59f78203990738
BLAKE2b-256 ff4c9a89fd3ba074c4576335d753f9d92c63185ce1472a49f6a39b197928bbfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ae7362e9ef51e6d28871f8334cd86046278120fc16f93317290c54bcc2f9460e
MD5 3ffbf4884ae68a3592defaccb1168cce
BLAKE2b-256 1d8f109c98524c760afd0a3f625a47b227525841d43ecdec4ce2342a198437d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_sgt-0.2.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 6.2 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_sgt-0.2.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2c97d2dffd5fc0e4a67ae3540bda60f3ad1605308f7112dbf80d86bc9933f306
MD5 bddd166f12196110ba27a3b2afbe7da7
BLAKE2b-256 ea616f900d71280385881481be57576d34e19c0ec05a4da17af17dfaca9daedf

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-cp39-abi3-win_amd64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2d45a892d2e399e95635a93bfe96b0996b7abc4c2b3f8c7259b36635dae2653e
MD5 1be7e037106bb5c6d648313b85a4c7cd
BLAKE2b-256 b92d0d2fcb52cd5910b7424d46d86b1f7f28f28f3c6037153b2037612d379114

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 db79eaa3ebd6dde67fe5d2c1b1ae24846b62f5d7e26d73e710df55e52c180e72
MD5 d83f1ae5ea6f1261ad46ca07557e42e1
BLAKE2b-256 792076c3779422891956aaf6a50695c6d96600d238bcdf4bbb37155e1e881094

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1e79e8347d54e998cd0567d2e24b51b8d3e64d6772b0ef7b11a05f065b04d7a7
MD5 f9b02e9544329abc78a82ce76303252d
BLAKE2b-256 28b3eca0a660ff9bc671af579eff65c3260b562797db3db0aa39b7cb3fc89d04

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 f0c21cd9c1e2ec3facadf63d709e5a59065034855ef63b3efb1a78d28e6c1d05
MD5 3723251a21e7170891278762cbb64795
BLAKE2b-256 90a754f05155b6523d31130253c67083c250977176fedf2d37db44198ae68641

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page