A smarter, weighted, feature-selective KNN algorithm with automatic preprocessing.
Project description
SmartKNN
A smarter, weighted, feature-selective KNN algorithm that automatically learns feature importance, filters weak features, handles missing values, normalizes data, and delivers significantly better accuracy than classical KNN — all with a simple sklearn-like API.
SmartKNN supports both classification and regression, requires zero manual tuning for preprocessing, and is fully compatible with NumPy and Pandas.
Badges
Features
-
Automatic Feature Weighting
- Univariate MSE scoring
- Mutual Information
- Random Forest importance
-
Automatic Preprocessing
- Normalization
- NaN / Inf cleaning
- Median imputation
- Value clipping
-
Automatic Feature Filtering
- Removes low-weight & noisy features
- Keeps only important signals
-
Weighted Euclidean Distance
-
Scikit-Learn Style API
fit()predict()kneighbors()
-
Supports
- NumPy arrays
- Pandas DataFrames
- Regression + Classification
Installation
Install from PyPI
bash\ pip install smart-knn
Local install
pip install .
Quick Start
import pandas as pd
from smart_knn import SmartKNN
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]
model = SmartKNN(k=5)
model.fit(X, y)
sample = X.iloc[0]
pred = model.predict(sample)
print("Prediction:", pred)
SmartKNN will automatically:
- Normalize inputs
- Learn weights
- Clean NaN/Inf
- Filter weak features
Predict Multiple Rows
preds = model.predict(X.iloc[:10])
print(preds)
Note on Classification (Temporary — v0.2.x)
SmartKNN was originally designed to auto-detect classification vs regression based on the target values.
In rare cases, integer-valued regression datasets (e.g., energy = 0, 1, 2, 3) could be mistaken for classification and cause errors when evaluated using sklearn metrics.
To guarantee stability and zero breaking changes for current users, SmartKNN now:
Works reliably with both regression and classification inputs
Uses safe numeric prediction output by default
Avoids sklearn "continuous vs multiclass" errors automatically
If using SmartKNN for classification, simply map predictions back to class labels:
preds = model.predict(X_test)
preds = preds.round().astype(int)
A full enhanced classification engine (with probability vote + label-safe decoding) will be released in a future update.
How SmartKNN Works
- Learns feature importance (MSE + MI + Random Forest).
- Removes weak features.
- Normalizes input.
- Applies weighted Euclidean distance.
- Optimized vectorized NumPy inference.
Results:
- Higher accuracy
- Faster prediction
- Lower noise sensitivity
- Better generalization
API Overview
Initialize
model = SmartKNN(k=5, weight_threshold=0.05)
Fit
model.fit(X, y)
Predict
model.predict(sample)
Neighbors
idx, dists = model.kneighbors(sample)
Inspect Model
model.weights_
model.feature_mask_
model.X_.shape
Hyperparameters
| Parameter | Description | Range |
|---|---|---|
k |
Number of neighbors | 3–15 |
weight_threshold |
Drop features below weight | 0–0.2 |
alpha |
MSE weight importance | 0–1 |
beta |
MI importance | 0–1 |
gamma |
RF importance | 0–1 |
n_jobs |
Parallel workers | 1–8 |
📁 Project Structure
smart_knn/
├── base_knn.py
├── distance.py
├── weight_learning.py
├── data_processing.py
├── utils.py
├── evaluation.py
├── adaptive_k.py
├── prototypes.py
└── signatures.py
docs/
├── design.md
├── theory.md
├── roadmap.md
└── usage.md
benchmarks/
├── classification_tests/
├── regression_tests/
└── heatmaps/
Benchmark Visuals


Roadmap
- Adaptive-K
- Prototype compression
- Neural metric learning
- FAISS / HNSW accelerated search
- GPU support
- Distance signatures
- Incremental learning
License
SmartKNN is released under the MIT License.
See the LICENSE file for details.
Contributing
PRs and feature requests are welcome! If you like SmartKNN, star the repository.
🔗 Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_knn-0.1.1.tar.gz.
File metadata
- Download URL: smart_knn-0.1.1.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77976c6ee90ab05e79549d438184bbf6ffbc97dcc5b3dc5bc4969d84f4f1ea70
|
|
| MD5 |
2858a26d80f245378793637ef693a18d
|
|
| BLAKE2b-256 |
3999692cc93d60cf8971d8cc092b4ec891a38e37febf9a9b3fcd186189c2b99a
|
File details
Details for the file smart_knn-0.1.1-py3-none-any.whl.
File metadata
- Download URL: smart_knn-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f96f730ec7c5cbf173d573f2a4e59c5ca048aa1502f5d856847d16526e7519d5
|
|
| MD5 |
8e2fcf49bd0b9f9d90636bd10a2d03a3
|
|
| BLAKE2b-256 |
20a73b5ae77e2f5febacb652be59b04837a9bc95b283b7ec5e0e3f2ce8ab9733
|