Skip to main content

A minimalistic framework for numerical association rule mining

Project description

logo


NiaARM - A minimalistic framework for Numerical Association Rule Mining


PyPI Version PyPI - Python Version PyPI - Downloads Downloads GitHub license GitHub commit activity Average time to resolve an issue

NiaARM is a framework for Association Rule Mining based on nature-inspired algorithms for optimization. The framework is written fully in Python and runs on all platforms. NiaARM allows users to preprocess the data in a transaction database automatically, to search for association rules and provide a pretty output of the rules found. This framework also supports integral and real-valued types of attributes besides the categorical ones. Mining the association rules is defined as an optimization problem, and solved using the nature-inspired algorithms that come from the related framework called NiaPy.

Detailed insights

The current version includes (but is not limited to) the following functions:

  • loading datasets in CSV format,
  • preprocessing of data,
  • searching for association rules,
  • providing output of mined association rules,
  • generating statistics about mined association rules,
  • visualization of association rules,
  • association rule text mining (experimental).

Installation

pip

Install NiaARM with pip:

pip install niaarm

To install NiaARM on Alpine Linux, please enable Community repository and use:

$ apk add py3-niaarm

To install NiaARM on Arch Linux, please use an AUR helper:

$ yay -Syyu python-niaarm

To install NiaARM on Fedora, use:

$ dnf install python3-niaarm

Usage

Loading data

In NiaARM, data loading is done via the Dataset class. There are two options for loading data:

Option 1: From a pandas DataFrame (recommended)

import pandas as pd
from niaarm import Dataset


df = pd.read_csv('datasets/Abalone.csv')
# preprocess data...
data = Dataset(df)
print(data) # printing the dataset will generate a feature report

Option 2: From CSV file directly

from niaarm import Dataset


data = Dataset('datasets/Abalone.csv')
print(data)

Mining association rules the easy way (recommended)

Association rule mining can be easily performed using the get_rules function:

from niaarm import get_rules
from niapy.algorithms.basic import DifferentialEvolution

algo = DifferentialEvolution(population_size=50, differential_weight=0.5, crossover_probability=0.9)
metrics = ('support', 'confidence')

rules, run_time = get_rules(data, algo, metrics, max_iters=30, logging=True)

print(rules) # Prints basic stats about the mined rules
print(f'Run Time: {run_time}')
rules.to_csv('output.csv')

Mining association rules the hard way

The above example can be also be implemented using a more low level interface, with the NiaARM class directly:

from niaarm import NiaARM, Dataset
from niapy.algorithms.basic import DifferentialEvolution
from niapy.task import Task, OptimizationType


# Create a problem:::
# dimension represents the dimension of the problem;
# features represent the list of features, while transactions depicts the list of transactions
# metrics is a sequence of metrics to be taken into account when computing the fitness;
# you can also pass in a dict of the shape {'metric_name': <weight of metric in range [0, 1]>};
# when passing a sequence, the weights default to 1.
problem = NiaARM(data.dimension, data.features, data.transactions, metrics=('support', 'confidence'), logging=True)

# build niapy task
task = Task(problem=problem, max_iters=30, optimization_type=OptimizationType.MAXIMIZATION)

# use Differential Evolution (DE) algorithm from the NiaPy library
# see full list of available algorithms: https://github.com/NiaOrg/NiaPy/blob/master/Algorithms.md
algo = DifferentialEvolution(population_size=50, differential_weight=0.5, crossover_probability=0.9)

# run algorithm
best = algo.run(task=task)

# sort rules
problem.rules.sort()

# export all rules to csv
problem.rules.to_csv('output.csv')

Visualization

The framework currently supports the hill slopes visualization method presented in [4]. More visualization methods are planned to be implemented in future releases.

from matplotlib import pyplot as plt
from niaarm import Dataset, RuleList, get_rules
from niaarm.visualize import hill_slopes

dataset = Dataset('datasets/Abalone.csv')
metrics = ('support', 'confidence')
rules, _ = get_rules(dataset, 'DifferentialEvolution', metrics, max_evals=1000, seed=1234)
some_rule = rules[150]
hill_slopes(some_rule, dataset.transactions)
plt.show()

logo

Text Mining (Experimental)

An experimental implementation of association rule text mining using nature-inspired algorithms, based on ideas from [5] is also provided. The niaarm.text module contains the Corpus and Document classes for loading and preprocessing corpora, a TextRule class, representing a text rule, and the NiaARTM class, implementing association rule text mining as a continuous optimization problem. The get_text_rules function, equivalent to get_rules, but for text mining, was also added to the niaarm.mine module.

import pandas as pd
from niaarm.text import Corpus
from niaarm.mine import get_text_rules
from niapy.algorithms.basic import ParticleSwarmOptimization

df = pd.read_json('datasets/text/artm_test_dataset.json', orient='records')
documents = df['text'].tolist()
corpus = Corpus.from_list(documents)

algorithm = ParticleSwarmOptimization(population_size=200, seed=123)
metrics = ('support', 'confidence', 'aws')
rules, time = get_text_rules(corpus, max_terms=5, algorithm=algorithm, metrics=metrics, max_evals=10000, logging=True)

if len(rules):
    print(rules)
    print(f'Run time: {time:.2f}s')
    rules.to_csv('output.csv')
else:
    print('No rules generated')
    print(f'Run time: {time:.2f}s')

Note: You may need to download stopwords and the punkt tokenizer from nltk by running import nltk; nltk.download('stopwords'); nltk.download('punkt').

For a full list of examples see the examples folder in the GitHub repository.

Command line interface

We provide a simple command line interface, which allows you to easily mine association rules on any input dataset, output them to a csv file and/or perform a simple statistical analysis on them.

niaarm -h
usage: niaarm [-h] [-v] -i INPUT_FILE [-o OUTPUT_FILE] -a ALGORITHM [-s SEED]
              [--max-evals MAX_EVALS] [--max-iters MAX_ITERS] --metrics
              METRICS [METRICS ...] [--weights WEIGHTS [WEIGHTS ...]] [--log]
              [--show-stats]

Perform ARM, output mined rules as csv, get mined rules' statistics

options:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        Input file containing a csv dataset
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Output file for mined rules
  -a ALGORITHM, --algorithm ALGORITHM
                        Algorithm to use (niapy class name, e.g.
                        DifferentialEvolution)
  -s SEED, --seed SEED  Seed for the algorithm's random number generator
  --max-evals MAX_EVALS
                        Maximum number of fitness function evaluations
  --max-iters MAX_ITERS
                        Maximum number of iterations
  --metrics METRICS [METRICS ...]
                        Metrics to use in the fitness function.
  --weights WEIGHTS [WEIGHTS ...]
                        Weights in range [0, 1] corresponding to --metrics
  --log                 Enable logging of fitness improvements
  --show-stats          Display stats about mined rules

Note: The CLI script can also run as a python module (python -m niaarm ...)

Reference Papers:

Ideas are based on the following research papers:

[1] I. Fister Jr., A. Iglesias, A. Gálvez, J. Del Ser, E. Osaba, I Fister. Differential evolution for association rule mining using categorical and numerical attributes In: Intelligent data engineering and automated learning - IDEAL 2018, pp. 79-88, 2018.

[2] I. Fister Jr., V. Podgorelec, I. Fister. Improved Nature-Inspired Algorithms for Numeric Association Rule Mining. In: Vasant P., Zelinka I., Weber GW. (eds) Intelligent Computing and Optimization. ICO 2020. Advances in Intelligent Systems and Computing, vol 1324. Springer, Cham.

[3] I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical association rule mining. arXiv preprint arXiv:2010.15524 (2020).

[4] Fister, I. et al. (2020). Visualization of Numerical Association Rules by Hill Slopes. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2020. IDEAL 2020. Lecture Notes in Computer Science(), vol 12489. Springer, Cham. https://doi.org/10.1007/978-3-030-62362-3_10

[5] I. Fister, S. Deb, I. Fister, Population-based metaheuristics for Association Rule Text Mining, In: Proceedings of the 2020 4th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence, New York, NY, USA, mar. 2020, pp. 19–23. doi: 10.1145/3396474.3396493.

License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

niaarm-0.2.1.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

niaarm-0.2.1-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file niaarm-0.2.1.tar.gz.

File metadata

  • Download URL: niaarm-0.2.1.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.7 Linux/5.19.11-200.fc36.x86_64

File hashes

Hashes for niaarm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8967b2302eeb3d01521185440c0bcbd8d5eb5129811b0857114fbe13576ded81
MD5 5a7defc92ca771c093692a4a86864701
BLAKE2b-256 bc31ba31a2e1443f0c77ecd76d31d27ad04b62bcedfd51cf0aa15be868f9de73

See more details on using hashes here.

File details

Details for the file niaarm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: niaarm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.7 Linux/5.19.11-200.fc36.x86_64

File hashes

Hashes for niaarm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dcf19e1e8f9e73915bc4faceadc0093a219dba017c11331d96300c328699e7f1
MD5 e87bb4497cc64684ef2fd9433e8ca510
BLAKE2b-256 d2b862042ef5d3b32b9e117ee174e3d13d0c22a2c61a1d932dd3a2991a5799f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page