Python package to mine association rules in datasets
Project description
ruleminer
Python package to discover association rules in Pandas DataFrames.
This package implements the code of the paper Discovering and ranking validation rules in supervisory data.
- Free software: MIT/X license
- Documentation: https://ruleminer.readthedocs.io/en/latest.
Features
Here is what the package does:
-
Generate human-readable validation rules using rule templates containing regular expressions and a Pandas DataFrame dataset
- available functions: min, max, abs, quantile, sum, substr, split, count, sumif and countif
- including parameters for metric filters and rule precisions (including XBRL tolerances)
-
Evaluate rules and calculate association rules metrics
- available metrics: abs support, abs exceptions, confidence, support, added value, casual confidence, casual support, conviction, lift and rule power factor
Here are some examples of rule templates with regexes with which you can generate validation rules:
-
if ({"Type"} == ".") then ({"."} > 0)
-
if ({"."} > 0) then (({"."} == 0) & ({"."} > 0))*
-
(({"."} + {"."} + {"."}) == {"."})
-
({"Own funds"} <= quantile({"Own funds"}, 0.95))
-
(substr({"Type"}, 0, 1) in ["a", "b"])
The first template generates (with the dataset described in the Usage section) rules like
- if ({"Type"} == "non-life_insurer") then ({"TP-nonlife"} > 0)
- if ({"Type"} == "life_insurer") then ({"TP-life"} > 0)
These generated validation rules can then be used to validate new datasets.
Contributors
- Willem Jan Willemse https://github.com/wjwillemse
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ruleminer-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 11a6f1ee47d4bab706b7321e9d042d7d9b0ad900f799cd2f7b36a6ce36a62aef |
|
MD5 | f3cccd6bbfe3c480e23544a08e717f8f |
|
BLAKE2b-256 | a49b66ff58d4311950786758376445533121944599572e9b1175da479d4dbc0e |