Skip to main content

Similarity-Adaptive Monotonic Entropy: frequent-itemset mining with FWER-controlled rules.

Project description

same-fim: Similarity-Adaptive Monotonic Entropy for frequent itemset mining

Python reference implementation of SAME, a frequent-itemset miner that (1) derives its support thresholds from the information content of the data, and (2) attaches a Tarone--Bonferroni FWER guarantee to every returned rule.

Full method, theorems, and evaluation: Necir & Benarab, "SAME: Similarity-Adaptive Monotonic Entropy Based Method for Frequent Itemsets Extraction" (2026, under review at e-Informatica Software Engineering Journal).

Install

pip install same-fim

Optional baselines used in the paper's benchmark:

pip install "same-fim[baselines]"

Minimal example

import pandas as pd
from same_fim import SAME

df = pd.read_csv("my_binary_data.csv").astype("int8")

# parameter-free mode: alpha and persistence are derived from the data
est = SAME(auto_hyperparams=True, search_mode="dfs", max_k=5)
est.fit(df.values, feature_names=list(df.columns))

for r in est.result_.rules:
    if r.passes_fwer:
        print(r)

Command-line

same-mine --input data.csv --out rules.csv --mode dfs --auto --fwer-only

Reproducing the paper

From a checkout of the repository:

pip install -e ".[baselines,dev]"
python experiments/reproduce.py

This runs the domain benchmark on the five datasets (ABIDE, EEG Eye State, synth_neuro, ClinVar, Pfam-UniProt), the scaling study (n up to 10^6), the downstream classification probes on ABIDE and EEG, the auto-hyperparameter ablation, and regenerates every figure in the paper's fig_v2/ directory.

Seeded variance run (Table tab:variance)

The single-seed numbers in the main tables are supplemented by a 5-seed variance run whose output populates \TBD{...} placeholders in the LaTeX:

python experiments/bench_seeded.py \
    --seeds 5 --timeout 1800 \
    --datasets abide eeg synth_neuro clinvar pfam \
    --methods same_dfs same_opus apriori apriori_bonferroni fpgrowth \
    --out results/variance.csv

# From ../EINF-PAPER/, inject the CSV values into paper.tex in-place.
python wire_variance.py --csv ../SAME_v4/experiments/results/variance.csv \
                       --tex paper.tex

wire_variance.py writes a .bak on first run and is idempotent: re-running with a refreshed CSV overwrites any previously substituted cells.

Apriori + post-hoc Bonferroni baseline

python experiments/apriori_bonferroni.py \
    --csv datasets/abide.csv --sigma 0.10 --alpha 0.05 \
    --method bonferroni --out apriori_bonf_abide.csv

This is the reviewer-requested isolation of "adaptive threshold" from "FWER correction": same Fisher test and same alpha as SAME, differing only in the support threshold. See [paper.tex tab:baselines_ext] for the side-by-side at sigma = 0.10.

Docker baselines

Dockerfiles for LAMP, SPuManTE, WYlight, OPUS Miner, SPMF, and Kingfisher are in experiments/baselines_ext/docker/. See that directory's README for per-image build instructions and the running order.

Core guarantees

SAME returns association rules with:

  • A data-derived support threshold combining a LAMP-style base floor s_0, a Webb (2007) layered per-level decay, a Hoeffding margin, and a Matthews-rescaled cohesion penalty.
  • Tarone--Bonferroni FWER control at a user-selected alpha (default 0.05) on the exact testable count.
  • Polynomial time in n for fixed maximum itemset cardinality, with Roaring-bitmap TID lists.

auto_hyperparams=True removes the Hoeffding-margin fraction and the persistence threshold from the user-facing interface, leaving only the standard statistical confidence level alpha.

Citation

@article{necirbenarab2026same,
  author  = {Hamid Necir and Massyl Benarab},
  title   = {{SAME}: Similarity-Adaptive Monotonic Entropy Based Method for
             Frequent Itemsets Extraction},
  journal = {e-Informatica Software Engineering Journal},
  year    = {2026},
  note    = {Under review}
}

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

same_fim-0.7.3.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

same_fim-0.7.3-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file same_fim-0.7.3.tar.gz.

File metadata

  • Download URL: same_fim-0.7.3.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for same_fim-0.7.3.tar.gz
Algorithm Hash digest
SHA256 37d0f3ab028d7179125c2edf40f6a01fab2a80c5f61211313a06ac70095550d0
MD5 584700eb31746e6cc12c60a530d9f527
BLAKE2b-256 0c7f32b37b84317ffd0d1a92ca29e5779ceec8f92ec06d9d61d20916faee771f

See more details on using hashes here.

File details

Details for the file same_fim-0.7.3-py3-none-any.whl.

File metadata

  • Download URL: same_fim-0.7.3-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for same_fim-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebd9d5695bf088f2a5a49eec4ee6dad51f8c799a2d087fe028d04480f71826f
MD5 052e2ca3358d55975cc6a135f3715eba
BLAKE2b-256 16e356a70bbdf2ae0b3f4c9b41511ad2950fe7b4530c036e2c00e4841b3f226e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page