Skip to main content

Sequence Graph Transform (SGT) for Polars - Transform sequential data into weighted n-gram representations

Project description

polars-sgt

Sequence Graph Transform for Polars

PyPI version

Transform sequential data into powerful n-gram representations with Polars.

polars-sgt brings Sequence Graph Transform (SGT) to Polars, enabling you to:

  • ✅ Transform sequences into weighted n-gram features
  • ✅ Capture temporal patterns with time-based weighting
  • ✅ Apply flexible normalization strategies (L1, L2, or none)
  • ✅ Handle datetime, date, duration, and numeric time columns
  • ✅ Blazingly fast, written in Rust
  • ✅ Compatible with Polars lazy evaluation and streaming

What is SGT?

Sequence Graph Transform converts sequential data (like user clickstreams, sensor readings, or transaction histories) into weighted n-gram representations. It captures:

  • Sequential patterns: Unigrams, bigrams, trigrams, and higher-order n-grams
  • Temporal dynamics: Time-based weighting with multiple decay functions
  • Normalized features: L1/L2 normalization for comparable feature spaces

Perfect for:

  • User behavior analysis
  • Time series feature engineering
  • Sequential pattern mining
  • Anomaly detection in sequences

Installation

Then install polars-sgt:

pip install polars-sgt

Quick Start

Basic Example

import polars as pl
import polars_sgt as sgt

# User clickstream data
df = pl.DataFrame({
    "user_id": [1, 1, 1, 2, 2, 2],
    "action": ["login", "view_product", "purchase", "login", "view_product", "logout"],
    "timestamp": [0, 10, 20, 0, 5, 15],
})

# Generate bigrams with exponential time decay
result = df.select(
    sgt.sgt_transform(
        "user_id",
        "action",
        time_col="timestamp",
        kappa=2,  # bigrams
        time_penalty="exponential",
        alpha=0.1,
        mode="l1"  # L1 normalization
    ).alias("sgt_features")
)

# Extract features
features = result.select([
    pl.col("sgt_features").struct.field("sequence_id"),
    pl.col("sgt_features").struct.field("ngram_keys").alias("ngrams"),
    pl.col("sgt_features").struct.field("ngram_values").alias("weights"),
]).explode(["ngrams", "weights"])

print(features)

With DateTime Columns

from datetime import datetime

df = pl.DataFrame({
    "session_id": ["A", "A", "A", "A"],
    "event": ["start", "click", "scroll", "exit"],
    "time": [
        datetime(2024, 1, 1, 10, 0),
        datetime(2024, 1, 1, 10, 5),
        datetime(2024, 1, 1, 10, 7),
        datetime(2024, 1, 1, 10, 15),
    ],
})

result = df.select(
    sgt.sgt_transform(
        "session_id",
        "event",
        time_col="time",
        deltatime="m",  # minutes
        kappa=3,  # trigrams
        time_penalty="inverse",
    )
)

Lazy Evaluation & Streaming

result = (
    pl.scan_csv("large_sequences.csv")
    .with_columns(pl.col("timestamp").str.to_datetime())
    .select(
        sgt.sgt_transform(
            "user_id",
            "action",
            time_col="timestamp",
            kappa=2,
            deltatime="h",
        )
    )
    .collect(streaming=True)
)

Parameters

Required

  • sequence_id_col: Column with sequence identifiers (groups)
  • state_col: Column with state/event values

Optional

  • time_col: Timestamp column (datetime, date, duration, or numeric)

  • kappa: Maximum n-gram size (default: 1)

    • 1 = unigrams only
    • 2 = unigrams + bigrams
    • 3 = unigrams + bigrams + trigrams, etc.
  • time_penalty: Time decay function (default: "inverse")

    • "inverse": weight = alpha / time_diff
    • "exponential": weight = exp(-alpha × time_diff)
    • "linear": weight = max(0, 1 - alpha × time_diff)
    • "power": weight = 1 / time_diff^beta
    • "none": No time penalty
  • mode: Normalization mode (default: "l1")

    • "l1": Sum of weights = 1
    • "l2": L2 norm = 1
    • "none": No normalization
  • length_sensitive: Apply length normalization (default: False)

  • alpha: Time penalty scale parameter (default: 1.0)

  • beta: Power parameter for "power" penalty (default: 2.0)

  • deltatime: Time unit for datetime columns

    • "s", "m", "h", "d", "w", "month", "q", "y"

Output

Returns a Struct with three fields:

  • sequence_id: Original sequence identifier
  • ngram_keys: List of n-gram strings (e.g., "login -> view -> purchase")
  • ngram_values: List of corresponding weights

Additional DateTime Utilities

While SGT is the primary focus, polars-sgt also includes helpful datetime utilities from the original polars-xdt:

  • Timezone conversions
  • Localized date formatting
  • Julian date conversion
  • Month delta calculations

See the full API documentation for details.

Author & Acknowledgments

Author: Zedd (lytran14789@gmail.com)

Special Thanks: This project is built upon polars-xdt created by Marco Gorelli. We are grateful for his excellent foundation.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_sgt-0.1.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_sgt-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

polars_sgt-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

polars_sgt-0.1.0-cp39-abi3-win_amd64.whl (6.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.0 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

polars_sgt-0.1.0-cp39-abi3-macosx_11_0_arm64.whl (5.9 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_sgt-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_sgt-0.1.0.tar.gz.

File metadata

  • Download URL: polars_sgt-0.1.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_sgt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 665dbfb6574f1615a6f43f3c7f6568bc94fdf92c693ec285982317d24e6d1458
MD5 51ebbf1183148f66267553330c45ccd8
BLAKE2b-256 fdad9b92bc136e9738624a484397cbecfde3f92cc4cca592901a86719c75a62f

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0.tar.gz:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c04e70310af99398ed35c9b7fa5d64c66408a8d8aeabbf9e04ff526668edeac5
MD5 07f29ea00a9edff7573401f2f59a4d74
BLAKE2b-256 c48be3155e3f06e863dec1418fb01b9deebeabf3392283c1d1f5a8e53cc7dbc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 28536c19d45a06dafb355d40c9c252e7b308613555975e52bf892d7f60b77919
MD5 28394b0c86ce1f4344b8e88a49e55dd3
BLAKE2b-256 8b3dba6e09fd085794f05650cd6089bde1b77fc6ff99bc6f8dc30954338f87fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_sgt-0.1.0-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 6.1 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_sgt-0.1.0-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e4b357d9f4c89990ba7055c3174412600075dcc10de163dcf5cbec3739c38208
MD5 0d8619ff9ada0cd5ff9093a4644af1dc
BLAKE2b-256 554f1e857dadf8ac7c41debecc7c1a38f0b9b9a9d646727b48fe64eb56d43295

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-cp39-abi3-win_amd64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ad0cc6ee6cfad371001d10830a680808cb9bcd8ae34dc8f32ee563b9ab6bbfa0
MD5 9d2edee7a8165377e69b5f039146893e
BLAKE2b-256 7e1ecc8201dccef6637d3cd6af0051aa86f6462eda62db00d9499d0e7ac7352f

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e635718c0afe18154be3ee042bd310e42dd29d5b0cda243fe1a0e3ba5c349fbd
MD5 44d63f71ebfa6fdeab9f6a98e926691b
BLAKE2b-256 74e648c9933ab422d1e02a09ea4e0c2521d983a33cf43f17484a7003d086a355

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.1.0-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba28bf1af0754c8681b084514c52c93e2055997251756a25fdfdd470c574173d
MD5 dc9432e6007693ea693dee5ebf456a54
BLAKE2b-256 2dae76a0d810f6f675049b4c5e5665c9d297592f10ea8ff670dfe74009d337f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 27cc09d11a7b5176e1ce9deb996610d0006939750ecd7f6ecb4c6f8a2c7f76d3
MD5 5cea66120c55eb6f87175395f0872c94
BLAKE2b-256 06d978ba06d2bb729829f66c557289508a31ade065591d2afb79243a8b2561c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.1.0-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page