Fast Kingfisher rule mining (top-K non-redundant dependency rules via Fisher's exact test) using parallel Branch and Bound.
Project description
kingfisher-bnb
Fast Kingfisher rule mining — the top-K non-redundant, statistically significant dependency rules of a binary/transactional dataset — implemented as a parallel Best-First Branch & Bound search in Rust with a small Python API.
Kingfisher finds rules A ⇒ B (and negative rules A ⇒ ¬B) ranked by a
statistical measure, without a minimum-support threshold. False discovery is
controlled by algorithmic design — top-K + a non-redundancy rule (a specific rule
must beat all its generalizations) + a raw significance cutoff — not by
family-wise/FDR correction, so the Fisher scores it returns are raw,
uncorrected p-values.
Install
pip install kingfisher-bnb # prebuilt wheels (Linux/macOS/Windows)
To build from source you need a Rust toolchain (>= 1.85) and maturin:
maturin develop --release # builds + installs into the active venv
Python usage
import math
import kingfisher_bnb as kf
# Dense rows -> sparse item lists (and an id<->name mapping)
dense = [[1, 1, 0],
[1, 0, 1],
[0, 1, 1]]
sparse, id_to_name, _ = kf.dense_to_sparse(dense, ["Apple", "Banana", "Cherry"])
rules = kf.find_rules_from_data(
data=sparse,
k=2, # max attribute index (n_columns - 1)
q=10, # top-K rules to keep
l_max=3, # max rule length (antecedent + consequent)
t_type=3, # 1=positive, 2=negative, 3=both
m_threshold=1.0, # raw cutoff (p<=1 => keep all top-q for Fisher)
measure_type=1, # 1/2=Fisher's exact, 3=chi^2, 4=mutual info, 5=leverage
)
for r in rules:
ant = " AND ".join(id_to_name[i] for i in r.antecedent)
cons = id_to_name[r.consequent]
sign = "NOT " if r.is_negative else ""
p = math.exp(r.measure_value) # Fisher: measure_value = ln(p)
print(f"IF {ant} THEN {sign}{cons} (p={p:.4g})")
Each Rule exposes: antecedent (list[int]), consequent (int),
is_negative (bool), measure_value (float — ln(p) for Fisher, a negated
statistic otherwise so that smaller is always better), and the contingency
counts frequency_x, frequency_xa, frequency_a.
CLI
kingfisher --data data/test_data.txt --cols 4 --t-type 3 --measure-type 1
Data format: one transaction per line, space-separated attribute indices.
Measures
measure_type |
Measure | Returns |
|---|---|---|
| 1, 2 | Fisher's exact test | ln(p) (raw p-value) |
| 3 | Chi-squared | test statistic |
| 4 | Mutual information | bits |
| 5 | Leverage | effect size |
t_type is a separate axis (rule direction): 1 positive, 2 negative, 3 both.
Multiple-testing correction (Tarone)
find_rules_from_data returns raw p-values. To control the family-wise error
rate over the many pairwise Fisher tests, tarone() computes the effective
number of tests via Tarone's method — counting only testable hypotheses
(pairs whose minimum attainable p, given the margins, can reach significance). It
reuses the same bounds the search uses, so it is cheap, and far more powerful than
plain Bonferroni over C(n_items, 2).
res = kf.tarone(sparse, k=2, alpha=0.05)
res.m_eff # effective number of tests (<= C(n_items, 2))
res.threshold # corrected raw-p cutoff = alpha / m_eff
res.m_eff_at(0.01) # any other alpha, no recomputation over the data
res.spectrum # [(min_p, count), ...] minimal-p histogram
# a pair is significant when its raw Fisher p <= res.threshold
Correction is defined on p-values, so it applies to the Fisher measure only
(measure_type 1/2); chi²/MI/leverage are effect sizes with no p-value to correct.
Pairwise (single-item) hypotheses.
Attribution
This is an independent Rust implementation of the Kingfisher algorithm from Wilhelmiina Hämäläinen's published work — written from the papers, not ported from the original C source. If you use it in research, please cite:
Hämäläinen, W. Efficient discovery of the top-K optimal dependency rules with Fisher's exact test of significance. ICDM 2010.
The tarone() correction follows:
Tarone, R. E. A modified Bonferroni method for discrete data. Biometrics 46(2):515–522, 1990.
For its application to pattern mining, see also Terada et al., Statistical significance of combinatorial regulations (PNAS 2013, "LAMP"), and Hämäläinen & Webb, A tutorial on statistically sound pattern discovery (DMKD 2019).
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kingfisher_bnb-0.2.0.tar.gz.
File metadata
- Download URL: kingfisher_bnb-0.2.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b69bdd5908ac45011ea323c49c82ff4bce6f906dd765c97d6f1222e3085b16a9
|
|
| MD5 |
62f42ddc186780642d0825d3b370c1f1
|
|
| BLAKE2b-256 |
270f7d33a58ebe1178af0f7eb743a6f5e8af8f73a931f7fef4e01862405dab76
|
File details
Details for the file kingfisher_bnb-0.2.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: kingfisher_bnb-0.2.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 227.0 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a6f3457c2d2dc28a0a191d2098c57577c459f6449aab1964f9e43a3488fcf24
|
|
| MD5 |
959522a24d3f03d6f2273036f0c25b7a
|
|
| BLAKE2b-256 |
8f798cd8d7d73ad2bbaabdcdb5808103620b8dfdcfe8c7ce808461722e39b972
|
File details
Details for the file kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 378.9 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e10da45d044fcb70f662dabe5c0841b70b40e7c2b5961ea7f528e56b51cd01e9
|
|
| MD5 |
1b57cdeccc6d1bfd9d63b3da3c7bafa5
|
|
| BLAKE2b-256 |
7d9ba081b938125600f4897ade79709726154864ef0ed1744f94a90fe2e91808
|
File details
Details for the file kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: kingfisher_bnb-0.2.0-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 372.2 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
618c9b14acde029b1ad6ac37394794bf06df2661bb2938d7342f5a0b58d8cc98
|
|
| MD5 |
a3573c6464b150a1589d021340aec523
|
|
| BLAKE2b-256 |
6a928fa6321436444429338fd67fbebecff973df1e9fb7252d327ca46fde3852
|
File details
Details for the file kingfisher_bnb-0.2.0-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: kingfisher_bnb-0.2.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 330.7 kB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.25 {"installer":{"name":"uv","version":"0.11.25","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5d00ef655fa0a4f9df275ba889b51a2ff6985b959e73803396fc2341ab3f88b2
|
|
| MD5 |
78b4ffe449dbb6733d7bf924a18f9767
|
|
| BLAKE2b-256 |
86e279f91d7595d0eca3d31561e4bc7c8232c8519fed430d7573827b22057794
|