Skip to main content

Fast Kingfisher rule mining (top-K non-redundant dependency rules via Fisher's exact test) using parallel Branch and Bound.

Project description

kingfisher-bnb

Fast Kingfisher rule mining — the top-K non-redundant, statistically significant dependency rules of a binary/transactional dataset — implemented as a parallel Best-First Branch & Bound search in Rust with a small Python API.

Kingfisher finds rules A ⇒ B (and negative rules A ⇒ ¬B) ranked by a statistical measure, without a minimum-support threshold. False discovery is controlled by algorithmic design — top-K + a non-redundancy rule (a specific rule must beat all its generalizations) + a raw significance cutoff — not by family-wise/FDR correction, so the Fisher scores it returns are raw, uncorrected p-values.

Install

pip install kingfisher-bnb        # prebuilt wheels (Linux/macOS/Windows)

To build from source you need a Rust toolchain (>= 1.85) and maturin:

maturin develop --release         # builds + installs into the active venv

Python usage

import math
import kingfisher_bnb as kf

# Dense rows -> sparse item lists (and an id<->name mapping)
dense = [[1, 1, 0],
         [1, 0, 1],
         [0, 1, 1]]
sparse, id_to_name, _ = kf.dense_to_sparse(dense, ["Apple", "Banana", "Cherry"])

rules = kf.find_rules_from_data(
    data=sparse,
    k=2,            # max attribute index (n_columns - 1)
    q=10,           # top-K rules to keep
    l_max=3,        # max rule length (antecedent + consequent)
    t_type=3,       # 1=positive, 2=negative, 3=both
    m_threshold=1.0,  # raw cutoff (p<=1 => keep all top-q for Fisher)
    measure_type=1,   # 1/2=Fisher's exact, 3=chi^2, 4=mutual info, 5=leverage
)

for r in rules:
    ant = " AND ".join(id_to_name[i] for i in r.antecedent)
    cons = id_to_name[r.consequent]
    sign = "NOT " if r.is_negative else ""
    p = math.exp(r.measure_value)      # Fisher: measure_value = ln(p)
    print(f"IF {ant} THEN {sign}{cons}  (p={p:.4g})")

Each Rule exposes: antecedent (list[int]), consequent (int), is_negative (bool), measure_value (float — ln(p) for Fisher, a negated statistic otherwise so that smaller is always better), and the contingency counts frequency_x, frequency_xa, frequency_a.

CLI

kingfisher --data data/test_data.txt --cols 4 --t-type 3 --measure-type 1

Data format: one transaction per line, space-separated attribute indices.

Measures

measure_type Measure Returns
1, 2 Fisher's exact test ln(p) (raw p-value)
3 Chi-squared test statistic
4 Mutual information bits
5 Leverage effect size

t_type is a separate axis (rule direction): 1 positive, 2 negative, 3 both.

Multiple-testing correction (Tarone)

find_rules_from_data returns raw p-values. To control the family-wise error rate over the many pairwise Fisher tests, tarone() computes the effective number of tests via Tarone's method — counting only testable hypotheses (pairs whose minimum attainable p, given the margins, can reach significance). It reuses the same bounds the search uses, so it is cheap, and far more powerful than plain Bonferroni over C(n_items, 2).

res = kf.tarone(sparse, k=2, alpha=0.05)
res.m_eff           # effective number of tests (<= C(n_items, 2))
res.threshold       # corrected raw-p cutoff = alpha / m_eff
res.m_eff_at(0.01)  # any other alpha, no recomputation over the data
res.spectrum        # [(min_p, count), ...] minimal-p histogram

# a pair is significant when its raw Fisher p <= res.threshold

Correction is defined on p-values, so it applies to the Fisher measure only (measure_type 1/2); chi²/MI/leverage are effect sizes with no p-value to correct. Pairwise (single-item) hypotheses.

Attribution

This is an independent Rust implementation of the Kingfisher algorithm from Wilhelmiina Hämäläinen's published work — written from the papers, not ported from the original C source. If you use it in research, please cite:

Hämäläinen, W. Efficient discovery of the top-K optimal dependency rules with Fisher's exact test of significance. ICDM 2010.

The tarone() correction follows:

Tarone, R. E. A modified Bonferroni method for discrete data. Biometrics 46(2):515–522, 1990.

For its application to pattern mining, see also Terada et al., Statistical significance of combinatorial regulations (PNAS 2013, "LAMP"), and Hämäläinen & Webb, A tutorial on statistically sound pattern discovery (DMKD 2019).

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kingfisher_bnb-0.2.0.tar.gz (25.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

kingfisher_bnb-0.2.0-cp38-abi3-win_amd64.whl (227.0 kB view details)

Uploaded CPython 3.8+Windows x86-64

kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378.9 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (372.2 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

kingfisher_bnb-0.2.0-cp38-abi3-macosx_11_0_arm64.whl (330.7 kB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file kingfisher_bnb-0.2.0.tar.gz.

File metadata

  • Download URL: kingfisher_bnb-0.2.0.tar.gz
  • Upload date:
  • Size: 25.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kingfisher_bnb-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b69bdd5908ac45011ea323c49c82ff4bce6f906dd765c97d6f1222e3085b16a9
MD5 62f42ddc186780642d0825d3b370c1f1
BLAKE2b-256 270f7d33a58ebe1178af0f7eb743a6f5e8af8f73a931f7fef4e01862405dab76

See more details on using hashes here.

File details

Details for the file kingfisher_bnb-0.2.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: kingfisher_bnb-0.2.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 227.0 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kingfisher_bnb-0.2.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3a6f3457c2d2dc28a0a191d2098c57577c459f6449aab1964f9e43a3488fcf24
MD5 959522a24d3f03d6f2273036f0c25b7a
BLAKE2b-256 8f798cd8d7d73ad2bbaabdcdb5808103620b8dfdcfe8c7ce808461722e39b972

See more details on using hashes here.

File details

Details for the file kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

  • Download URL: kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
  • Upload date:
  • Size: 378.9 kB
  • Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e10da45d044fcb70f662dabe5c0841b70b40e7c2b5961ea7f528e56b51cd01e9
MD5 1b57cdeccc6d1bfd9d63b3da3c7bafa5
BLAKE2b-256 7d9ba081b938125600f4897ade79709726154864ef0ed1744f94a90fe2e91808

See more details on using hashes here.

File details

Details for the file kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

  • Download URL: kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
  • Upload date:
  • Size: 372.2 kB
  • Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 618c9b14acde029b1ad6ac37394794bf06df2661bb2938d7342f5a0b58d8cc98
MD5 a3573c6464b150a1589d021340aec523
BLAKE2b-256 6a928fa6321436444429338fd67fbebecff973df1e9fb7252d327ca46fde3852

See more details on using hashes here.

File details

Details for the file kingfisher_bnb-0.2.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: kingfisher_bnb-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 330.7 kB
  • Tags: CPython 3.8+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kingfisher_bnb-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5d00ef655fa0a4f9df275ba889b51a2ff6985b959e73803396fc2341ab3f88b2
MD5 78b4ffe449dbb6733d7bf924a18f9767
BLAKE2b-256 86e279f91d7595d0eca3d31561e4bc7c8232c8519fed430d7573827b22057794

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page