Skip to main content

Python package to mine association rules in datasets

Project description

ruleminer

ReadTheDocs https://img.shields.io/pypi/v/ruleminer.svg License: MIT Code style: black

Python package to discover association rules in Pandas DataFrames.

This package implements the code of the paper Discovering and ranking validation rules in supervisory data.

The documentation can be found here.

Here is what the package does:

  • Generate human-readable validation rules using rule templates containing regular expressions and a Pandas DataFrame dataset

    • available functions: min, max, abs, quantile, sum, substr, split, count, sumif and countif

    • including parameters for metric filters and rule precisions (including XBRL tolerances)

  • Evaluate rules and calculate association rules metrics

    • available metrics: abs support, abs exceptions, confidence, support, added value, casual confidence, casual support, conviction, lift and rule power factor

Here are some examples of rule templates with regexes with which you can generate validation rules:

  • if ({“Type”} == “.*”) then ({“.*”} > 0)

  • if ({“.*”} > 0) then (({“.*”} == 0) & ({“.*”} > 0))

  • (({“.*”} + {“.*”} + {“.*”}) == {“.*”})

  • ({“Own funds”} <= quantile({“Own funds”}, 0.95))

  • (substr({“Type”}, 0, 1) in [“a”, “b”])

The first template generates (with the dataset described in the Usage section) rules like

  • if ({“Type”} == “non-life_insurer”) then ({“TP-nonlife”} > 0)

  • if ({“Type”} == “life_insurer”) then ({“TP-life”} > 0)

These generated validation rules can then be used to validate new datasets.

History

0.1.0 (2021-11-21)

  • First release on PyPI.

0.1.1 (2021-11-23)

  • Added more documentation to the README text

0.1.2 (2022-1-20)

  • Bug fixes wrt some complex expressions

0.1.3 (2022-1-26)

  • Optimized rule generation process

0.1.4 (2022-1-26)

  • Evaluated columns in then part are now dependent on if part of rule

0.1.5 (2022-1-30)

  • Rule with quantiles added (including evaluating intermediate results)

0.1.6 and 0.1.7 (2022-2-1)

  • A number of optimization in rule generation process

0.1.8 (2022-2-3)

  • Rule power factor metric added

0.1.12 (2022-5-11)

  • Optimizations: metric calculations are done with boolean masks of DataFrame

0.1.14 (2023-4-17)

  • Nested functions added

  • substr and in operators added

0.1.16 (2023-8-3)

  • Templates now do not necessarily have to contain a regex

  • Bug fix when evaluating rules that contain columns that do not exist

  • Templates now can start with ‘if () then’

0.1.17 (2023-8-8)

  • Generate rules now runs without specified data

0.1.18 (2023-8-8)

  • Dedicated function added for template to rule conversion without data

  • Exp sign changed from ^ to **

0.1.19 (2023-8-27)

  • Small fixes rule conversion without data

0.1.20 (2023-8-29)

  • Small fixes in evaluating rules with syntax errors

0.1.21 (2023-10-11)

  • changed sum to nansum

  • added tolerance functionality for ==

0.1.22 (2023-10-17)

  • added tolerance functionality for !=, <, <=, > and >=

  • updated docs

0.1.23 (2023-10-18)

  • added nested conditions in functions

0.1.24 (2023-10-25)

  • added sumif and improved tolerance functionality

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ruleminer-0.1.25-py2.py3-none-any.whl (21.2 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page