Skip to main content

Dynamic, low-resource pattern mining with sklearn-compatible API

Project description

dynamic-pattern-mining

dynamic-pattern-mining is a scikit-learn-compatible library for mining clinical code patterns and recommending likely next codes.

Example goal:

If a patient has codes A, B, C, infer likely additional codes such as D from cohort-wide structure.

Why this approach

Compared to classic candidate-generation workflows (Apriori/FP-Growth style), this estimator is designed for:

  • low memory usage via integer coding + sparse matrices
  • robust behavior under code-string variants through normalization
  • direct personalized ranking (recommendation), not only global frequent itemsets
  • shrinkage-aware scoring for stability on sparse/rare co-occurrences
  • optional second-order diffusion over the learned code graph

Install

pip install dynamic-pattern-mining

Quick Start (Long Format)

import pandas as pd
from dynamic_pattern_mining import DynamicPatternMiner

# long format: one row per (patient, code)
df = pd.DataFrame(
    [
        (1, "I10"), (1, "E11"), (1, "N18"),
        (2, "I10"), (2, "E11"),
        (3, "J45"), (3, "R06"),
    ],
    columns=["patient_id", "code"],
)

miner = DynamicPatternMiner(
    patient_col="patient_id",
    code_col="code",
    min_code_frequency=1,
    min_pair_frequency=1,
)

miner.fit(df)

print(miner.recommend(["I10", "E11"], top_k=5))
print(miner.explain_recommendation(["I10", "E11"], target_code="N18"))
print(miner.mine_common_patterns(top_k=10, min_score=-1e9))

Quick Start (Basket Format)

import pandas as pd
from dynamic_pattern_mining import DynamicPatternMiner

X = pd.DataFrame(
    {
        "basket": [
            ["I10", "E11"],
            ["I10", "N18"],
            ["J45", "R06"],
        ]
    }
)

miner = DynamicPatternMiner(
    basket_col="basket",
    min_code_frequency=1,
    min_pair_frequency=1,
    output_format="sparse",
)

X_rec = miner.fit_transform(X)
print(X_rec.shape)

sklearn Pipeline Example

import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression

from dynamic_pattern_mining import DynamicPatternMiner

X = pd.DataFrame({"basket": [["I10", "E11"], ["J45", "R06"], ["F32", "F41"]]})
y = [0, 1, 2]

pipe = Pipeline([
    ("miner", DynamicPatternMiner(basket_col="basket", output_format="sparse")),
    ("clf", LogisticRegression(max_iter=2000)),
])

pipe.fit(X, y)

Full Parameter Reference

DynamicPatternMiner signature:

DynamicPatternMiner(
    patient_col="patient_id",
    code_col="code",
    basket_col=None,
    min_code_frequency=3,
    min_pair_frequency=2,
    max_codes=None,
    chunk_size=None,
    lowercase=True,
    normalize_text=True,
    pair_smoothing=1.0,
    shrinkage_lambda=10.0,
    popularity_penalty=0.10,
    diffusion_weight=0.25,
    output_top_k=30,
    output_format="sparse",
    dtype=np.float32,
)

Input Parsing

  • patient_col: str (default "patient_id") Patient identifier column for long-format input.
  • code_col: str (default "code") Code column for long-format input.
  • basket_col: str | None (default None) Basket column if each row already contains a list/set of codes.

Frequency / Pruning

  • min_code_frequency: int (default 3) Minimum patient-level frequency for a code to be kept.
  • min_pair_frequency: int (default 2) Minimum pair co-occurrence count to keep an edge.
  • max_codes: int | None (default None) Optional top-K code cap after frequency filtering.

Resource / Scaling

  • chunk_size: int | None (default None) Reserved chunking control for large input processing.

Normalization

  • lowercase: bool (default True) Lowercase code strings.
  • normalize_text: bool (default True) Normalize separators (_, -, repeated spaces) for robust matching.

Scoring / Pattern Dynamics

  • pair_smoothing: float (default 1.0) Additive smoothing for conditional probability estimates.
  • shrinkage_lambda: float (default 10.0) Shrinkage strength for low-support pairs.
  • popularity_penalty: float (default 0.10) Penalizes globally frequent consequents to reduce trivial recommendations.
  • diffusion_weight: float (default 0.25) Weight of second-order graph diffusion contribution.

Output Control

  • output_top_k: int (default 30) Max number of positive recommendations kept per sample in transform.
  • output_format: {"sparse", "dense", "pandas"} (default "sparse") Return type of transform.
  • dtype: numpy dtype (default np.float32) Numeric dtype for learned scores and outputs.

Main Methods

  • fit(X) Learns code vocabulary, pair graph, and dynamic score matrix.
  • transform(X) Returns recommendation-score features per sample.
  • recommend(basket, top_k=10) Personalized top-code recommendations.
  • explain_recommendation(basket, target_code, top_drivers=5) Source-code contributions for a target recommendation.
  • mine_common_patterns(top_k=20, min_score=0.0) Global antecedent→consequent patterns from learned score graph.
  • get_feature_names_out() Feature names for transformed output.

FP-Growth Benchmark

Run the built-in benchmark comparison:

python src/dynamic_pattern_mining/benchmarks/fp_growth_benchmark.py

It reports:

  • recall_at_5_dynamic_pattern_miner
  • recall_at_5_fp_growth
  • delta

Development

pip install -e .[dev]
pytest
python -m build

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dynamic_pattern_mining-0.1.0.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dynamic_pattern_mining-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file dynamic_pattern_mining-0.1.0.tar.gz.

File metadata

  • Download URL: dynamic_pattern_mining-0.1.0.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for dynamic_pattern_mining-0.1.0.tar.gz
Algorithm Hash digest
SHA256 772dcbd2177e7d59c8d81e6b7b4e36e8906313410a3e180f966b199ad561b5ec
MD5 d833a793e27ab2d66d583bd6756c1ee1
BLAKE2b-256 a83c3b9d25d6e59168225fdc905ca90edccd54656a0f2d998726040cc9a2c416

See more details on using hashes here.

File details

Details for the file dynamic_pattern_mining-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dynamic_pattern_mining-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddd7d6f4185875b0d17a679b4df9b514e748afc606513058d4d4b9b5ae824efd
MD5 79c95468e9ba9a044ec2b96133daf3fe
BLAKE2b-256 09c08d2352d162fa87912a3f04a7b254750a33a71af03bb10dcff1e3e7e29b4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page