Skip to main content

Package for mining association rules using GUHA and Redescription mining.

Project description

Krtek

Krtek is a package for mining association rules. It implements the 4ft-Miner (GUHA) and ReReMi (Redescription mining) methods. These methods can be used to automatically generate and validate association rules.

The whole package is designed to be easily extendable with quantifiers, coefficients, or even new methods. The package is very focused on readability and pythonic code. If you want to find out more about some of the implemented features I strongly recommend reading the documented code.

This package is the result of the practical part of master thesis on the Department of Computer Science, Palacky University, Olomouc. The aim of the paper was to compare two association rule mining approaches, namely GUHA (General Unary Hypotheses Automaton) and Redescription mining.

Installation

To install the library, you can use pip:

pip install krtek

Then you can start using it. You can import it like:

# As a whole package.
import krtek

# Import only individual parts
# For 4ft-Miner
from krtek import FourFtMiner, Literal, PartialCedent, Cedent, coefficients, quantifiers, utils
# For ReReMi
from krtek import ReReMiMiner, coefficients, quantifiers, utils

Use examples

Here is a sample usage of the package. The first example focuses on mining redescriptions in the Ecological Niche dataset, and the second focuses on mining association rules in the Student Performance dataset [7]. More detailed examples can be found in the examples folder.

Ecological Niche example:

import pandas as pd
from krtek import ReReMiMiner, coefficients, quantifiers

# Load the data
data_RHS = pd.read_csv("mammals_small/data_RHS.csv")
data_LHS = pd.read_csv("mammals_small/data_LHS.csv")

# Preprocessing
# ...

lhs_attributes = # mammals
rhs_attributes = # temperatures of locations

# Here we can specify how the attributes should be used in the mining task
mammals_coefficients = {mammal: coefficients.OneCategory(True) for mammal in lhs_attributes}
temperature_coefficients = {temperature: coefficients.Sequence(1, 5) for temperature in rhs_attributes}
attributes_coefficients = {**mammals_coefficients, **temperature_coefficients}

# Defines the mining task
task = FourFtMiner(hotel, antecedent, succedent, task = ReReMiMiner(
    data,
    lhs_attributes,
    rhs_attributes,
    initial_pair_size = 10,
    beam_search_size = 10,
    max_side_size = 2,
    min_accuracy = 0.8,
    quantifier = quantifiers.Jaccard,
    operators = ["AND", "OR", "NOT"],
    category_coefficient = attributes_coefficients
)

# Starts the mining task
task.run()

# After the mining task is finished, we can print the statistics and access the results
task.print_run_info()
task.result

Student Performance example:

import pandas as pd
from krtek import FourFtMiner, Literal, PartialCedent, Cedent, coefficients, quantifiers

# Load the data
data_students = pd.read_csv('student performance/student-por.csv', sep=';')

# Preprocessing
# ...

# Here we can specify how the attributes should be used in the mining task
# Partial Cedent – Family
family = PartialCedent([
            Literal("famsize", coefficients.Subset(1, 1)),
            Literal("Pstatus", coefficients.Subset(1, 1)),
            Literal("Mjob", coefficients.Subset(1, 1)),
            Literal("Fjob", coefficients.Subset(1, 1)),
            Literal("guardian", coefficients.Subset(1, 1)),
            Literal("Medu", coefficients.Sequence(1, 2)),
            Literal("Fedu", coefficients.Sequence(1, 2)),
            Literal("famrel", coefficients.Sequence(1, 2)),
        ], 1, 2, name="Family"
    )

antecedent = Cedent([family])

# Partial Cedent – Performance
student_performance = PartialCedent([
            Literal("G3_grade", coefficients.Sequence(1, 2)),
        ],
        1, name="Performance"
    )

succedent = Cedent([student_performance])

# Quantifier specification
quantifier = [quantifiers.FoundedImplication(0.75, 100)]

# Starts the mining task
task = FourFtMiner(data_students, antecedent, succedent, quantifier)

# Starts the mining task
task.run()

# After the mining task is finished, we can print the statistics and access the results
task.print_run_info()
task.result

References

[1] P. Hájek and T. Havránek, Mechanizing hypothesis formation. 1978. doi: 10.1007/978-3-642-66943-9.

[2] E. Galbrun and P. Miettinen, Redescription mining. 2017. doi: 10.1007/978-3-319-72889-6.

[3] E. Galbrun and P. Miettinen, “From black and white to full color: extending redescription mining outside the Boolean world,” Statistical Analysis and Data Mining the ASA Data Science Journal, vol. 5, no. 4, pp. 284–303, Apr. 2012, doi: 10.1002/sam.11145.

[4] J. Rauch and M. Simunek, “An alternative approach to mining association rules.,” Foundations of Data Mining and Knowledge Discovery, pp. 211–231, Jan. 2005, [Online]. Available: https://dblp.uni-trier.de/rec/series/sci/RauchS05.html.

[5] J. Rauch, “Classes of Association Rules: An Overview,” in Studies in computational intelligence, 2008, pp. 315–337. doi: 10.1007/978-3-540-78488-3_19.

[6] J. Rauch, M. Šimůnek, D. Chudán, and P. Máša, Mechanizing hypothesis formation: Principles and Case Studies, 1st ed. 2022. doi: 10.1201/9781003091448.

[7] Cortez, Paulo, "Student Performance.", UCI Machine Learning Repository, 2008, doi: https://doi.org/10.24432/C5TG7T.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

krtek-1.0.0.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

krtek-1.0.0-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file krtek-1.0.0.tar.gz.

File metadata

  • Download URL: krtek-1.0.0.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for krtek-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c99b7286bcb71b2f99f9cb3c0481439f959d85f30ed5ffb69f4932ad3ece6b14
MD5 8e60f70f9f2ddb9a1cac193566349100
BLAKE2b-256 7ebecf498b280b90c3aeaa5bd0a59ec256446c34b2e13c2d48c5186df127d9ff

See more details on using hashes here.

File details

Details for the file krtek-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: krtek-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for krtek-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c7386f5b1cfda8f95156ed33349f17d3f8ce89e2ec5da6d0a6db0aa5c7ab28f
MD5 c2b76d16ee19d5334b40cbcb45009d68
BLAKE2b-256 8ca8159da05f61473f11930a8992b38386d3fac091de5761878d7775c259bbde

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page