Similarity-Adaptive Monotonic Entropy: frequent-itemset mining with FWER-controlled rules.
Project description
same-fim: Similarity-Adaptive Monotonic Entropy for frequent itemset mining
Python reference implementation of SAME, a frequent-itemset miner that (1) derives its support thresholds from the information content of the data, and (2) attaches a Tarone--Bonferroni FWER guarantee to every returned rule.
Full method, theorems, and evaluation: Necir & Benarab, "SAME: Similarity-Adaptive Monotonic Entropy Based Method for Frequent Itemsets Extraction" (2026, under review at e-Informatica Software Engineering Journal).
Install
pip install same-fim
Optional baselines used in the paper's benchmark:
pip install "same-fim[baselines]"
Minimal example
import pandas as pd
from same_fim import SAME
df = pd.read_csv("my_binary_data.csv").astype("int8")
# parameter-free mode: alpha and persistence are derived from the data
est = SAME(auto_hyperparams=True, search_mode="dfs", max_k=5)
est.fit(df.values, feature_names=list(df.columns))
for r in est.result_.rules:
if r.passes_fwer:
print(r)
Command-line
same-mine --input data.csv --out rules.csv --mode dfs --auto --fwer-only
Reproducing the paper
From a checkout of the repository:
pip install -e ".[baselines,dev]"
python experiments/reproduce.py
This runs the domain benchmark on the five datasets (ABIDE, EEG Eye State,
synth_neuro, ClinVar, Pfam-UniProt), the scaling study (n up to 10^6),
the downstream classification probes on ABIDE and EEG, the
auto-hyperparameter ablation, and regenerates every figure in the paper's
fig_v2/ directory.
Seeded variance run (Table tab:variance)
The single-seed numbers in the main tables are supplemented by a 5-seed
variance run whose output populates \TBD{...} placeholders in the LaTeX:
python experiments/bench_seeded.py \
--seeds 5 --timeout 1800 \
--datasets abide eeg synth_neuro clinvar pfam \
--methods same_dfs same_opus apriori apriori_bonferroni fpgrowth \
--out results/variance.csv
# From ../EINF-PAPER/, inject the CSV values into paper.tex in-place.
python wire_variance.py --csv ../SAME_v4/experiments/results/variance.csv \
--tex paper.tex
wire_variance.py writes a .bak on first run and is idempotent: re-running
with a refreshed CSV overwrites any previously substituted cells.
Apriori + post-hoc Bonferroni baseline
python experiments/apriori_bonferroni.py \
--csv datasets/abide.csv --sigma 0.10 --alpha 0.05 \
--method bonferroni --out apriori_bonf_abide.csv
This is the reviewer-requested isolation of "adaptive threshold" from "FWER
correction": same Fisher test and same alpha as SAME, differing only in
the support threshold. See [paper.tex tab:baselines_ext] for the
side-by-side at sigma = 0.10.
Docker baselines
Dockerfiles for LAMP, SPuManTE, WYlight, OPUS Miner, SPMF, and Kingfisher
are in experiments/baselines_ext/docker/. See that directory's README for
per-image build instructions and the running order.
Core guarantees
SAME returns association rules with:
- A data-derived support threshold combining a LAMP-style base floor
s_0, a Webb (2007) layered per-level decay, a Hoeffding margin, and a Matthews-rescaled cohesion penalty. - Tarone--Bonferroni FWER control at a user-selected
alpha(default0.05) on the exact testable count. - Polynomial time in
nfor fixed maximum itemset cardinality, with Roaring-bitmap TID lists.
auto_hyperparams=True removes the Hoeffding-margin fraction and the
persistence threshold from the user-facing interface, leaving only the
standard statistical confidence level alpha.
Citation
@article{necirbenarab2026same,
author = {Hamid Necir and Massyl Benarab},
title = {{SAME}: Similarity-Adaptive Monotonic Entropy Based Method for
Frequent Itemsets Extraction},
journal = {e-Informatica Software Engineering Journal},
year = {2026},
note = {Under review}
}
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file same_fim-0.7.3.tar.gz.
File metadata
- Download URL: same_fim-0.7.3.tar.gz
- Upload date:
- Size: 27.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37d0f3ab028d7179125c2edf40f6a01fab2a80c5f61211313a06ac70095550d0
|
|
| MD5 |
584700eb31746e6cc12c60a530d9f527
|
|
| BLAKE2b-256 |
0c7f32b37b84317ffd0d1a92ca29e5779ceec8f92ec06d9d61d20916faee771f
|
File details
Details for the file same_fim-0.7.3-py3-none-any.whl.
File metadata
- Download URL: same_fim-0.7.3-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ebd9d5695bf088f2a5a49eec4ee6dad51f8c799a2d087fe028d04480f71826f
|
|
| MD5 |
052e2ca3358d55975cc6a135f3715eba
|
|
| BLAKE2b-256 |
16e356a70bbdf2ae0b3f4c9b41511ad2950fe7b4530c036e2c00e4841b3f226e
|