Skip to main content

A hybrid rule-guided k-Nearest Neighbor classifier for improved performance.

Project description

RACEkNN: A Hybrid Rule-Guided k-Nearest Neighbor Classifier

License: MIT

This repository contains the official Python implementation for the paper: "RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm".

RACEkNN is a hybrid classifier that integrates kNN with RACER (Rule Aggregating ClassifiEr), a novel rule-based classifier. RACER generates generalized rules to identify the most relevant subset of the training data for a given test instance. This pre-selection significantly reduces the search space for kNN, leading to faster execution times and improved classification accuracy.


📖 About the Paper

Title: RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm Journal: Knowledge-Based Systems (Volume 301), 2024 DOI: 10.1016/j.knosys.2024.112357 Authors: Mahdiyeh Ebrahimi, Alireza Basiri

Abstract

Classification is a fundamental task in data mining, involving the prediction of class labels for new data. k-Nearest Neighbors (kNN), a lazy learning algorithm, is sensitive to data distribution and suffers from high computational costs due to the requirement of finding the closest neighbors across the entire training set. Recent advancements in classification techniques have led to the development of hybrid algorithms that combine the strengths of multiple methods to address specific limitations. In response to the inherent execution time constraint of kNN and the impact of data distribution on its performance, we propose RACEkNN (Rule Aggregating ClassifiEr kNN), a hybrid solution that integrates kNN with RACER, a newly devised rule-based classifier. RACER improves predictive capability and decreases kNN’s runtime by creating more generalized rules, each encompassing a subset of training instances with similar characteristics. During prediction, a test instance is compared to these rules based on its features. By selecting the rule with the closest match, the test instance identifies the most relevant subset of training data for kNN. This significantly reduces the data kNN needs to consider, leading to faster execution times and enhanced prediction accuracy. Empirical findings demonstrate that RACEkNN outperforms kNN in terms of both runtime and accuracy. Additionally, it surpasses RACER, four well-known classifiers, and certain kNN-based methods in terms of accuracy.


🚀 Installation

To get started, clone the repository and install the required dependencies.

  1. Clone the repository:

    git clone [https://github.com/mahdiyehebrahimi/RACEkNN.git](https://github.com/mahdiyehebrahimi/RACEkNN.git)
    cd RACEkNN
    
  2. Install dependencies: It is recommended to use a virtual environment.

    pip install -r requirements.txt
    

💡 Usage Example

You can use RACEKNNClassifier just like any other scikit-learn classifier. Here is a simple example using the "Car Evaluation" dataset included in the Datasets/ directory.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from raceknn import RACEKNNClassifier

# Load data
df = pd.read_csv(
    "Datasets/car_evaluation.data",
    names=["buying", "maint", "doors", "persons", "lug_boot", "safety", "class"]
)
X = df.drop(columns=['class'])
y = df['class']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Initialize and fit the classifier
# alpha: RACER fitness trade-off (accuracy vs. coverage)
# k: Number of neighbors for the final kNN vote
clf = RACEKNNClassifier(alpha=0.9, k=5)
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of RACEKNN Classifier: {accuracy:.4f}")

For more examples, including how to use k-fold cross-validation, see the example.py.


🎓 Citing This Work

If you use RACEkNN in your research, please cite our paper.

BibTeX

@article{EBRAHIMI2024112357,
  title = {RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm},
  journal = {Knowledge-Based Systems},
  volume = {301},
  pages = {112357},
  year = {2024},
  issn = {0950-7051},
  doi = {[https://doi.org/10.1016/j.knosys.2024.112357](https://doi.org/10.1016/j.knosys.2024.112357)},
  author = {Mahdiyeh Ebrahimi and Alireza Basiri}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raceknn-0.1.1.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raceknn-0.1.1-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file raceknn-0.1.1.tar.gz.

File metadata

  • Download URL: raceknn-0.1.1.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for raceknn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 462a06414f03a3407a164da3b87ccc890aed930dc513c03fb6bf13dc81c6bc8c
MD5 a156e904f175b2ae1aa6929d3dcd89ef
BLAKE2b-256 030a8db2a10b21692525155e76b0f5577e242a68a2eaf54d6761a9ec449c74fe

See more details on using hashes here.

File details

Details for the file raceknn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: raceknn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for raceknn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5baef1b9380e72ca1ee260b3343f2b486f8f0fa775f8713156eec89855731a4c
MD5 a697a465ed1807b2c1f5df31244f8e16
BLAKE2b-256 f59a9fa54a34849244eadffaac13fb56ed9f2355650f1bc553f193a5bb31f93d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page