Skip to main content

Implementation of several preprocessing techniques for Association Rule Mining (ARM)

Project description

logo

arm-preprocessing

PyPI Version PyPI - Python Version PyPI - Downloads Packaging status Downloads License arm-preprocessing Documentation Status

Repository size Open issues Average time to resolve an issue GitHub commit activity GitHub contributors

💡 Why arm-preprocessing?✨ Key features📦 Installation🚀 Usage🔗 Related frameworks📚 References🔑 License

arm-preprocessing is a lightweight Python library supporting several key steps involving data preparation, manipulation, and discretisation for Association Rule Mining (ARM). 🧠 Embrace its minimalistic design that prioritises simplicity. 💡 The framework is intended to be fully extensible and offers seamless integration with related ARM libraries (e.g., NiaARM). 🔗

  • Free software: MIT license
  • Documentation: http://arm-preprocessing.readthedocs.io
  • Python: 3.9.x, 3.10.x, 3.11.x, 3.12x
  • Tested OS: Windows, Ubuntu, Fedora, Alpine, Arch, macOS. However, that does not mean it does not work on others

💡 Why arm-preprocessing?

While numerous libraries facilitate data mining preprocessing tasks, this library is designed to integrate seamlessly with association rule mining. It harmonises well with the NiaARM library, a robust numerical association rule mining framework. The primary aim is to bridge the gap between preprocessing and rule mining, simplifying the workflow/pipeline. Additionally, its design allows for the effortless incorporation of new preprocessing methods and fast benchmarking.

✨ Key features

  • Loading various formats of datasets (CSV, JSON, TXT, TCX) 📊
  • Converting datasets to different formats 🔄
  • Loading different types of datasets (numerical dataset, discrete dataset, time-series data, text, etc.) 📉
  • Dataset identification (which type of dataset) 🔍
  • Dataset statistics 📈
  • Discretisation methods 📏
  • Data squashing methods 🤏
  • Feature scaling methods ⚖️
  • Feature selection methods 🎯

📦 Installation

pip

To install arm-preprocessing with pip, use:

pip install arm-preprocessing

To install arm-preprocessing on Alpine Linux, please use:

$ apk add py3-arm-preprocessing

To install arm-preprocessing on Arch Linux, please use an AUR helper:

$ yay -Syyu python-arm-preprocessing

🚀 Usage

Data loading

The following example demonstrates how to load a dataset from a file (csv, json, txt). More examples can be found in the examples/data_loading directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('path/to/datasets', format='csv')

# Load dataset
dataset.load_data()
df = dataset.data

Missing values

The following example demonstrates how to handle missing values in a dataset using imputation. More examples can be found in the examples/missing_values directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('examples/missing_values/data', format='csv')
dataset.load()

# Impute missing data
dataset.missing_values(method='impute')

Data discretisation

The following example demonstrates how to discretise a dataset using the equal width method. More examples can be found in the examples/discretisation directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename (without format) and format (csv, json, txt)
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load_data()

# Discretise dataset using equal width discretisation
dataset.discretise(method='equal_width', num_bins=5, columns=['calories'])

Data squashing

The following example demonstrates how to squash a dataset using the euclidean similarity. More examples can be found in the examples/squashing directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/breast', format='csv')
dataset.load()

# Squash dataset
dataset.squash(threshold=0.75, similarity='euclidean')

Feature scaling

The following example demonstrates how to scale the dataset's features. More examples can be found in the examples/scaling directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/Abalone', format='csv')
dataset.load()

# Scale dataset using normalisation
dataset.scale(method='normalisation')

Feature selection

The following example demonstrates how to select features from a dataset. More examples can be found in the examples/feature_selection directory:

from arm_preprocessing.dataset import Dataset

# Initialise dataset with filename and format
dataset = Dataset('datasets/sportydatagen', format='csv')
dataset.load()

# Feature selection
dataset.feature_selection(
    method='kendall', threshold=0.15, class_column='calories')

🔗 Related frameworks

[1] NiaARM: A minimalistic framework for Numerical Association Rule Mining

[2] uARMSolver: universal Association Rule Mining Solver

📚 References

[1] I. Fister, I. Fister Jr., D. Novak and D. Verber, Data squashing as preprocessing in association rule mining, 2022 IEEE Symposium Series on Computational Intelligence (SSCI), Singapore, Singapore, 2022, pp. 1720-1725, doi: 10.1109/SSCI51031.2022.10022240.

[2] I. Fister Jr., I. Fister A brief overview of swarm intelligence-based algorithms for numerical association rule mining. arXiv preprint arXiv:2010.15524 (2020).

🔑 License

This package is distributed under the MIT License. This license can be found online at http://www.opensource.org/licenses/MIT.

Disclaimer

This framework is provided as-is, and there are no guarantees that it fits your purposes or that it is bug-free. Use it at your own risk!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arm_preprocessing-0.2.5.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arm_preprocessing-0.2.5-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file arm_preprocessing-0.2.5.tar.gz.

File metadata

  • Download URL: arm_preprocessing-0.2.5.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.13.2 Linux/6.13.5-200.fc41.x86_64

File hashes

Hashes for arm_preprocessing-0.2.5.tar.gz
Algorithm Hash digest
SHA256 1776de9301086dfadb39e1a4998bfbc8dda86822729d429c5e3738c1a412ae5a
MD5 e0adc891d0a0c7e7fcf99a475ad54fe7
BLAKE2b-256 761c39ec6fdfbd373702681f69c92c32a49e92883fa64ccfb9140fd2d2fbbf1f

See more details on using hashes here.

File details

Details for the file arm_preprocessing-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: arm_preprocessing-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.13.2 Linux/6.13.5-200.fc41.x86_64

File hashes

Hashes for arm_preprocessing-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3e0ab0a5ff20360a95f4079eadd91071da532c124c2c7c56e50614fbf1d5c60a
MD5 fe60ffb7ac34d74c0f3a9abd7a4d5b37
BLAKE2b-256 21db86c550bbbe2e93645ecc89875766b2e69eaf23eb76b16f13a0476f14fa1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page