Dynamic, low-resource pattern mining with sklearn-compatible API
Project description
dynamic-pattern-mining
dynamic-pattern-mining is a scikit-learn-compatible library for mining clinical code patterns and recommending likely next codes.
Example goal:
If a patient has codes A, B, C, infer likely additional codes such as D from cohort-wide structure.
Why this approach
Compared to classic candidate-generation workflows (Apriori/FP-Growth style), this estimator is designed for:
- low memory usage via integer coding + sparse matrices
- robust behavior under code-string variants through normalization
- direct personalized ranking (recommendation), not only global frequent itemsets
- shrinkage-aware scoring for stability on sparse/rare co-occurrences
- optional second-order diffusion over the learned code graph
Install
pip install dynamic-pattern-mining
Quick Start (Long Format)
import pandas as pd
from dynamic_pattern_mining import DynamicPatternMiner
# long format: one row per (patient, code)
df = pd.DataFrame(
[
(1, "I10"), (1, "E11"), (1, "N18"),
(2, "I10"), (2, "E11"),
(3, "J45"), (3, "R06"),
],
columns=["patient_id", "code"],
)
miner = DynamicPatternMiner(
patient_col="patient_id",
code_col="code",
min_code_frequency=1,
min_pair_frequency=1,
)
miner.fit(df)
print(miner.recommend(["I10", "E11"], top_k=5))
print(miner.explain_recommendation(["I10", "E11"], target_code="N18"))
print(miner.mine_common_patterns(top_k=10, min_score=-1e9))
Quick Start (Basket Format)
import pandas as pd
from dynamic_pattern_mining import DynamicPatternMiner
X = pd.DataFrame(
{
"basket": [
["I10", "E11"],
["I10", "N18"],
["J45", "R06"],
]
}
)
miner = DynamicPatternMiner(
basket_col="basket",
min_code_frequency=1,
min_pair_frequency=1,
output_format="sparse",
)
X_rec = miner.fit_transform(X)
print(X_rec.shape)
sklearn Pipeline Example
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from dynamic_pattern_mining import DynamicPatternMiner
X = pd.DataFrame({"basket": [["I10", "E11"], ["J45", "R06"], ["F32", "F41"]]})
y = [0, 1, 2]
pipe = Pipeline([
("miner", DynamicPatternMiner(basket_col="basket", output_format="sparse")),
("clf", LogisticRegression(max_iter=2000)),
])
pipe.fit(X, y)
Full Parameter Reference
DynamicPatternMiner signature:
DynamicPatternMiner(
patient_col="patient_id",
code_col="code",
basket_col=None,
min_code_frequency=3,
min_pair_frequency=2,
max_codes=None,
chunk_size=None,
lowercase=True,
normalize_text=True,
pair_smoothing=1.0,
shrinkage_lambda=10.0,
popularity_penalty=0.10,
diffusion_weight=0.25,
output_top_k=30,
output_format="sparse",
dtype=np.float32,
)
Input Parsing
patient_col:str(default"patient_id") Patient identifier column for long-format input.code_col:str(default"code") Code column for long-format input.basket_col:str | None(defaultNone) Basket column if each row already contains a list/set of codes.
Frequency / Pruning
min_code_frequency:int(default3) Minimum patient-level frequency for a code to be kept.min_pair_frequency:int(default2) Minimum pair co-occurrence count to keep an edge.max_codes:int | None(defaultNone) Optional top-K code cap after frequency filtering.
Resource / Scaling
chunk_size:int | None(defaultNone) Reserved chunking control for large input processing.
Normalization
lowercase:bool(defaultTrue) Lowercase code strings.normalize_text:bool(defaultTrue) Normalize separators (_,-, repeated spaces) for robust matching.
Scoring / Pattern Dynamics
pair_smoothing:float(default1.0) Additive smoothing for conditional probability estimates.shrinkage_lambda:float(default10.0) Shrinkage strength for low-support pairs.popularity_penalty:float(default0.10) Penalizes globally frequent consequents to reduce trivial recommendations.diffusion_weight:float(default0.25) Weight of second-order graph diffusion contribution.
Output Control
output_top_k:int(default30) Max number of positive recommendations kept per sample intransform.output_format:{"sparse", "dense", "pandas"}(default"sparse") Return type oftransform.dtype: numpy dtype (defaultnp.float32) Numeric dtype for learned scores and outputs.
Main Methods
fit(X)Learns code vocabulary, pair graph, and dynamic score matrix.transform(X)Returns recommendation-score features per sample.recommend(basket, top_k=10)Personalized top-code recommendations.explain_recommendation(basket, target_code, top_drivers=5)Source-code contributions for a target recommendation.mine_common_patterns(top_k=20, min_score=0.0)Global antecedent→consequent patterns from learned score graph.get_feature_names_out()Feature names for transformed output.
FP-Growth Benchmark
Run the built-in benchmark comparison:
python src/dynamic_pattern_mining/benchmarks/fp_growth_benchmark.py
It reports:
recall_at_5_dynamic_pattern_minerrecall_at_5_fp_growthdelta
Development
pip install -e .[dev]
pytest
python -m build
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dynamic_pattern_mining-0.1.0.tar.gz.
File metadata
- Download URL: dynamic_pattern_mining-0.1.0.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
772dcbd2177e7d59c8d81e6b7b4e36e8906313410a3e180f966b199ad561b5ec
|
|
| MD5 |
d833a793e27ab2d66d583bd6756c1ee1
|
|
| BLAKE2b-256 |
a83c3b9d25d6e59168225fdc905ca90edccd54656a0f2d998726040cc9a2c416
|
File details
Details for the file dynamic_pattern_mining-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dynamic_pattern_mining-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ddd7d6f4185875b0d17a679b4df9b514e748afc606513058d4d4b9b5ae824efd
|
|
| MD5 |
79c95468e9ba9a044ec2b96133daf3fe
|
|
| BLAKE2b-256 |
09c08d2352d162fa87912a3f04a7b254750a33a71af03bb10dcff1e3e7e29b4b
|