A smarter, weighted, feature-selective KNN algorithm with automatic preprocessing.
Project description
SmartKNN
A smarter, weighted, feature-selective KNN algorithm that automatically learns feature importance, filters weak features, handles missing values, normalizes data, and provides a significant improvement over classic KNN — all with a plug-and-play sklearn-like API.
SmartKNN works for both classification and regression with no additional settings.
Key Features
-
Automatic feature weighting using:
- Univariate MSE scoring
- Mutual Information
- Random Forest importance
-
Automatic normalization of all input data
-
NaN / Inf handling (both training and prediction)
-
Automatic feature filtering using learned weights
-
Weighted Euclidean distance for more accurate neighbor selection
-
Works out-of-the-box for classification & regression
-
Scikit-learn style API (
fit,predict,kneighbors) -
Supports NumPy arrays and Pandas DataFrames
-
Fast batch distance computation
Installation
pip install smart-knn
(If installing locally)
pip install .
Quick Start (Most Common Usage)
import pandas as pd
from smart_knn import SmartKNN
# Load your dataset
# Replace "target" with your actual label column
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]
# Train the model
model = SmartKNN(k=5)
model.fit(X, y)
# Predict for a single sample
sample = X.iloc[0]
pred = model.predict(sample)
print("Prediction:", pred)
SmartKNN automatically:
- Normalizes features
- Learns weights
- Filters useless features
- Cleans NaN / Inf values
- Prepares optimized distance functions
🔮 Predict on Multiple Samples
# Predict on first 10 rows
preds = model.predict(X.iloc[:10])
print(preds)
How It Works (Simple Explanation)
SmartKNN improves KNN by:
- Finding which features matter using MSE, MI, and Random Forest scoring.
- Removing useless features based on weights.
- Normalizing everything to prevent scale bias.
- Applying weighted Euclidean distance instead of plain distance.
- Using NumPy-optimized batch computations for fast inference.
This results in:
- Higher accuracy
- Faster predictions
- Lower noise sensitivity
- Adaptive feature selection
🔬 API Overview
Initialize
model = SmartKNN(k=5, weight_threshold=0.05)
Fit
model.fit(X, y)
Predict
pred = model.predict(sample)
Neighbors
idx, dists = model.kneighbors(sample)
Inspect internals
model.weights_ # Final feature weights
model.feature_mask_ # Which features were kept
model.X_.shape # Reduced feature matrix
Project Structure
smart_knn/
├── base_knn.py
├── distance.py
├── weight_learning.py
├── data_processing.py
├── utils.py
├── evaluation.py
├── adaptive_k.py (future)
├── prototypes.py (future)
└── signatures.py (future)
Additional documentation in:
docs/design.md— internal architecturedocs/theory.md— math and algorithmsdocs/usage.md— extended usage examplesdocs/roadmap.md— future improvements
Roadmap
- Adaptive-K optimization
- Prototype compression
- Distance signatures
- GPU acceleration
- Incremental learning support
- Batch offline inference
License
This project is licensed under the MIT License. See LICENSE file.
Contributing
PRs, suggestions, and feature requests are welcome! If you like the project, star it on GitHub.
Support
Have issues or questions? Open an issue on GitHub or message your friendly AI assistant
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smart_knn-0.1.0.tar.gz.
File metadata
- Download URL: smart_knn-0.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
665ed7603a464f32fa2fd71538a31b1b1cda78304ce7915a6bc456163b912472
|
|
| MD5 |
736d61238fc9c0bc4fdbd405a3f86d52
|
|
| BLAKE2b-256 |
7c54d9f7aa4486c06c1d1d267a31e779c5fd40a02b6bb4f95fb400b031b0298b
|
File details
Details for the file smart_knn-0.1.0-py3-none-any.whl.
File metadata
- Download URL: smart_knn-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
30c4aed5076ea06ab680ce55590c0add0420a678ddb91e841c7e507005235a3d
|
|
| MD5 |
7bbf1825e8e1bc89cf590b756a051995
|
|
| BLAKE2b-256 |
7b01ceffcf92f1d7578911173d947848200cd928980a37efcc28771c42a28bbb
|