Skip to main content

Sequence Graph Transform (SGT) for Polars - Transform sequential data into weighted n-gram representations

Project description

polars-sgt

Sequence Graph Transform for Polars

PyPI version

Transform sequential data into powerful n-gram representations with Polars.

polars-sgt brings Sequence Graph Transform (SGT) to Polars, enabling you to:

  • ✅ Transform sequences into weighted n-gram features
  • ✅ Capture temporal patterns with time-based weighting
  • ✅ Apply flexible normalization strategies (L1, L2, or none)
  • ✅ Handle datetime, date, duration, and numeric time columns
  • ✅ Blazingly fast, written in Rust
  • ✅ Compatible with Polars lazy evaluation and streaming

What is SGT?

Sequence Graph Transform converts sequential data (like user clickstreams, sensor readings, or transaction histories) into weighted n-gram representations. It captures:

  • Sequential patterns: Unigrams, bigrams, trigrams, and higher-order n-grams
  • Temporal dynamics: Time-based weighting with multiple decay functions
  • Normalized features: L1/L2 normalization for comparable feature spaces

Perfect for:

  • User behavior analysis
  • Time series feature engineering
  • Sequential pattern mining
  • Anomaly detection in sequences

Installation

Then install polars-sgt:

pip install polars-sgt

Quick Start

Basic Example

import polars as pl
import polars_sgt as sgt

# User clickstream data
df = pl.DataFrame({
    "user_id": [1, 1, 1, 2, 2, 2],
    "action": ["login", "view_product", "purchase", "login", "view_product", "logout"],
    "timestamp": [0, 10, 20, 0, 5, 15],
})

# Generate bigrams with exponential time decay
result = df.select(
    sgt.sgt_transform(
        "user_id",
        "action",
        time_col="timestamp",
        kappa=2,  # bigrams
        time_penalty="exponential",
        alpha=0.1,
        mode="l1"  # L1 normalization
    ).alias("sgt_features")
)

# Extract features
features = result.select([
    pl.col("sgt_features").struct.field("sequence_id"),
    pl.col("sgt_features").struct.field("ngram_keys").alias("ngrams"),
    pl.col("sgt_features").struct.field("value").alias("weights"),
]).explode(["ngrams", "weights"])

print(features)

#OR 
result = df.select(
    sgt.sgt_transform(
        "session_id",
        "event",
        time_col="time",
        deltatime="m",  # minutes
        kappa=3,  # trigrams
        time_penalty="inverse",
        mode="l2",
        alpha=0.5
    ).alias("struct_type")
)
out = (
    result
    .unnest("struct_type")
    .explode(["ngram_keys", "value"])
    .filter(pl.col("ngram_keys").str.split("->").list.len() > 0)
)

With DateTime Columns

from datetime import datetime

df = pl.DataFrame({
    "session_id": ["A", "A", "A", "A"],
    "event": ["start", "click", "scroll", "exit"],
    "time": [
        datetime(2024, 1, 1, 10, 0),
        datetime(2024, 1, 1, 10, 5),
        datetime(2024, 1, 1, 10, 7),
        datetime(2024, 1, 1, 10, 15),
    ],
})

result = df.select(
    sgt.sgt_transform(
        "session_id",
        "event",
        time_col="time",
        deltatime="m",  # minutes
        kappa=3,  # trigrams
        time_penalty="inverse",
    )
)

Lazy Evaluation & Streaming

result = (
    pl.scan_csv("large_sequences.csv")
    .with_columns(pl.col("timestamp").str.to_datetime())
    .select(
        sgt.sgt_transform(
            "user_id",
            "action",
            time_col="timestamp",
            kappa=2,
            deltatime="h",
        )
    )
    .collect(streaming=True)
)

Parameters

Required

  • sequence_id_col: Column with sequence identifiers (groups)
  • state_col: Column with state/event values

Optional

  • time_col: Timestamp column (datetime, date, duration, or numeric)

  • kappa: Maximum n-gram size (default: 1)

    • 1 = unigrams only
    • 2 = unigrams + bigrams
    • 3 = unigrams + bigrams + trigrams, etc.
  • time_penalty: Time decay function (default: "inverse")

    • "inverse": weight = alpha / time_diff
    • "exponential": weight = exp(-alpha × time_diff)
    • "linear": weight = max(0, 1 - alpha × time_diff)
    • "power": weight = 1 / time_diff^beta
    • "none": No time penalty
  • mode: Normalization mode (default: "l1")

    • "l1": Sum of weights = 1
    • "l2": L2 norm = 1
    • "none": No normalization
  • length_sensitive: Apply length normalization (default: False)

  • alpha: Time penalty scale parameter (default: 1.0)

  • beta: Power parameter for "power" penalty (default: 2.0)

  • deltatime: Time unit for datetime columns

    • "s", "m", "h", "d", "w", "month", "q", "y"

Output

Returns a Struct with three fields:

  • sequence_id: Original sequence identifier
  • ngram_keys: List of n-gram strings (e.g., "login -> view -> purchase")
  • value: List of corresponding weights

Additional DateTime Utilities

While SGT is the primary focus, polars-sgt also includes helpful datetime utilities from the original polars-xdt:

  • Timezone conversions
  • Localized date formatting
  • Julian date conversion
  • Month delta calculations

See the full API documentation for details.

Author & Acknowledgments

Author: Zedd (lytran14789@gmail.com)

Special Thanks: This project is built upon polars-xdt created by Marco Gorelli. We are grateful for his excellent foundation.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_sgt-0.2.5.tar.gz (1.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ ARM64

polars_sgt-0.2.5-cp39-abi3-win_amd64.whl (6.2 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl (5.9 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl (6.2 MB view details)

Uploaded CPython 3.9+macOS 10.12+ x86-64

File details

Details for the file polars_sgt-0.2.5.tar.gz.

File metadata

  • Download URL: polars_sgt-0.2.5.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_sgt-0.2.5.tar.gz
Algorithm Hash digest
SHA256 e8fa3f6d2938754a948e6244b0be2fef245b224904839d0face1e310e32bc332
MD5 ef3b13bf38bc4b682b4d79dcb242fa36
BLAKE2b-256 1c9541acae23dddf33fb15f71252946f2ab61231d7d64c1d5b5e5a14d0eaff56

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5.tar.gz:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ed847cc3973379030b5975a92c4c6062f6e5251f364f4c447eb4b7eefaec6e70
MD5 2e8f06558441c998b72740e5b4909927
BLAKE2b-256 889b546d189f6d4e315a1dc2b44867e327fd0dbdab3cb7c102493c9d8e4c89b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 9336e466199b2070eef0548bdf27b42cc9089f58ffe7ebef7d5b4ee2bc408a40
MD5 edb97c482bdcd45374ce2b27ee907aae
BLAKE2b-256 659760db19482aa2f3b05d6a0fcc5bfb16167e9eb863f3c98805f0870c1dcb22

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: polars_sgt-0.2.5-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 6.2 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for polars_sgt-0.2.5-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8b30ac87518158f9c5124ea81b6c1e9ef5130acc0b1a211a542df770da29e18f
MD5 164a6c753debe3a4e86da4db5ce267eb
BLAKE2b-256 9117b683ca757998a45308400266ffaafdb4e1908116092f6bf4e8989e31506e

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-win_amd64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bd8df211862b5f8651c1743d70a7835bcab995455974992683feb7e8e9514686
MD5 289824ca55ad0d7f29256ee947921ee5
BLAKE2b-256 08ba1c3c2205c7e7cc8a56fc958363458a0eb72d124a1a74f4194e7dd30d9bfd

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8a535556d3746dd1864a117a2fc0c6debe7808f46ca3edae633a3750cc36bf37
MD5 fbde8cfa4bd58998cb05f8357d53dbc9
BLAKE2b-256 7a3a41eee15aab56757cdb239fbb39083948108027f66fd0d708b3705fd47229

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 eea2e81bbd0e5e701519ea65bc7422ed655cae483251434dae3be03a356096b6
MD5 9e4b16fd4f0429f88b229ae169e3bd25
BLAKE2b-256 531343ad6b02faed3c877421e05f4e68f98bf2fc0ac247633f80047526d4cc6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 53ed9c689919e0b39523507a3243cedc3dc72de5fc7d850f286e413e7d32e757
MD5 9b3436563b2ae2d2aa9788d570aa9620
BLAKE2b-256 f8c50f52c792791252181042f655e4ff1f1837f8d48968796ea7e68f96b38efd

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_sgt-0.2.5-cp39-abi3-macosx_10_12_x86_64.whl:

Publisher: CI.yml on 4ursmile/polars-sgt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page