Skip to main content

A smarter, weighted, feature-selective KNN algorithm with automatic preprocessing.

Project description

SmartKNN

A smarter, weighted, feature-selective KNN algorithm that automatically learns feature importance, filters weak features, handles missing values, normalizes data, and delivers significantly better accuracy than classical KNN — all with a simple sklearn-like API.

SmartKNN supports both classification and regression, requires zero manual tuning for preprocessing, and is fully compatible with NumPy and Pandas.


Badges

PyPI Python License Status Downloads


Features

  • Automatic Feature Weighting

    • Univariate MSE scoring
    • Mutual Information
    • Random Forest importance
  • Automatic Preprocessing

    • Normalization
    • NaN / Inf cleaning
    • Median imputation
    • Value clipping
  • Automatic Feature Filtering

    • Removes low-weight & noisy features
    • Keeps only important signals
  • Weighted Euclidean Distance

  • Scikit-Learn Style API

    • fit()
    • predict()
    • kneighbors()
  • Supports

    • NumPy arrays
    • Pandas DataFrames
    • Regression + Classification

Installation

Install from PyPI

bash\ pip install smart-knn

Local install

pip install .

Quick Start

import pandas as pd
from smart_knn import SmartKNN

df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]

model = SmartKNN(k=5)
model.fit(X, y)

sample = X.iloc[0]
pred = model.predict(sample)
print("Prediction:", pred)

SmartKNN will automatically:

  • Normalize inputs
  • Learn weights
  • Clean NaN/Inf
  • Filter weak features

Predict Multiple Rows

preds = model.predict(X.iloc[:10])
print(preds)

Note on Classification (Temporary — v0.2.x)

SmartKNN was originally designed to auto-detect classification vs regression based on the target values.
In rare cases, integer-valued regression datasets (e.g., energy = 0, 1, 2, 3) could be mistaken for classification and cause errors when evaluated using sklearn metrics.

To guarantee stability and zero breaking changes for current users, SmartKNN now:

Works reliably with both regression and classification inputs
Uses safe numeric prediction output by default
Avoids sklearn "continuous vs multiclass" errors automatically

If using SmartKNN for classification, simply map predictions back to class labels:

preds = model.predict(X_test)
preds = preds.round().astype(int)   

A full enhanced classification engine (with probability vote + label-safe decoding) will be released in a future update.


How SmartKNN Works

  1. Learns feature importance (MSE + MI + Random Forest).
  2. Removes weak features.
  3. Normalizes input.
  4. Applies weighted Euclidean distance.
  5. Optimized vectorized NumPy inference.

Results:

  • Higher accuracy
  • Faster prediction
  • Lower noise sensitivity
  • Better generalization

API Overview

Initialize

model = SmartKNN(k=5, weight_threshold=0.05)

Fit

model.fit(X, y)

Predict

model.predict(sample)

Neighbors

idx, dists = model.kneighbors(sample)

Inspect Model

model.weights_
model.feature_mask_
model.X_.shape

Hyperparameters

Parameter Description Range
k Number of neighbors 3–15
weight_threshold Drop features below weight 0–0.2
alpha MSE weight importance 0–1
beta MI importance 0–1
gamma RF importance 0–1
n_jobs Parallel workers 1–8

📁 Project Structure

smart_knn/
 ├── base_knn.py
 ├── distance.py
 ├── weight_learning.py
 ├── data_processing.py
 ├── utils.py
 ├── evaluation.py
 ├── adaptive_k.py
 ├── prototypes.py
 └── signatures.py

docs/
 ├── design.md
 ├── theory.md
 ├── roadmap.md
 └── usage.md

benchmarks/
 ├── classification_tests/
 ├── regression_tests/
 └── heatmaps/

Benchmark Visuals

![Accuracy Heatmap](benchmarks/heatmaps/class_accuracy.png)
![Regression MSE](benchmarks/heatmaps/reg_mse.png)


Roadmap

  • Adaptive-K
  • Prototype compression
  • Neural metric learning
  • FAISS / HNSW accelerated search
  • GPU support
  • Distance signatures
  • Incremental learning

License

SmartKNN is released under the MIT License. See the LICENSE file for details.


Contributing

PRs and feature requests are welcome! If you like SmartKNN, star the repository.


🔗 Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_knn-0.1.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_knn-0.1.1-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file smart_knn-0.1.1.tar.gz.

File metadata

  • Download URL: smart_knn-0.1.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smart_knn-0.1.1.tar.gz
Algorithm Hash digest
SHA256 77976c6ee90ab05e79549d438184bbf6ffbc97dcc5b3dc5bc4969d84f4f1ea70
MD5 2858a26d80f245378793637ef693a18d
BLAKE2b-256 3999692cc93d60cf8971d8cc092b4ec891a38e37febf9a9b3fcd186189c2b99a

See more details on using hashes here.

File details

Details for the file smart_knn-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: smart_knn-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smart_knn-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f96f730ec7c5cbf173d573f2a4e59c5ca048aa1502f5d856847d16526e7519d5
MD5 8e2fcf49bd0b9f9d90636bd10a2d03a3
BLAKE2b-256 20a73b5ae77e2f5febacb652be59b04837a9bc95b283b7ec5e0e3f2ce8ab9733

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page