Skip to main content

A smarter, weighted, feature-selective KNN algorithm with automatic preprocessing.

Project description

SmartKNN

A smarter, weighted, feature-selective KNN algorithm that automatically learns feature importance, filters weak features, handles missing values, normalizes data, and provides a significant improvement over classic KNN — all with a plug-and-play sklearn-like API.

SmartKNN works for both classification and regression with no additional settings.


Key Features

  • Automatic feature weighting using:

    • Univariate MSE scoring
    • Mutual Information
    • Random Forest importance
  • Automatic normalization of all input data

  • NaN / Inf handling (both training and prediction)

  • Automatic feature filtering using learned weights

  • Weighted Euclidean distance for more accurate neighbor selection

  • Works out-of-the-box for classification & regression

  • Scikit-learn style API (fit, predict, kneighbors)

  • Supports NumPy arrays and Pandas DataFrames

  • Fast batch distance computation


Installation

pip install smart-knn

(If installing locally)

pip install .

Quick Start (Most Common Usage)

import pandas as pd
from smart_knn import SmartKNN

# Load your dataset
# Replace "target" with your actual label column
df = pd.read_csv("data.csv")
X = df.drop("target", axis=1)
y = df["target"]

# Train the model
model = SmartKNN(k=5)
model.fit(X, y)

# Predict for a single sample
sample = X.iloc[0]
pred = model.predict(sample)
print("Prediction:", pred)

SmartKNN automatically:

  • Normalizes features
  • Learns weights
  • Filters useless features
  • Cleans NaN / Inf values
  • Prepares optimized distance functions

🔮 Predict on Multiple Samples

# Predict on first 10 rows
preds = model.predict(X.iloc[:10])
print(preds)

How It Works (Simple Explanation)

SmartKNN improves KNN by:

  1. Finding which features matter using MSE, MI, and Random Forest scoring.
  2. Removing useless features based on weights.
  3. Normalizing everything to prevent scale bias.
  4. Applying weighted Euclidean distance instead of plain distance.
  5. Using NumPy-optimized batch computations for fast inference.

This results in:

  • Higher accuracy
  • Faster predictions
  • Lower noise sensitivity
  • Adaptive feature selection

🔬 API Overview

Initialize

model = SmartKNN(k=5, weight_threshold=0.05)

Fit

model.fit(X, y)

Predict

pred = model.predict(sample)

Neighbors

idx, dists = model.kneighbors(sample)

Inspect internals

model.weights_        # Final feature weights
model.feature_mask_   # Which features were kept
model.X_.shape        # Reduced feature matrix

Project Structure

smart_knn/
 ├── base_knn.py
 ├── distance.py
 ├── weight_learning.py
 ├── data_processing.py
 ├── utils.py
 ├── evaluation.py
 ├── adaptive_k.py (future)
 ├── prototypes.py (future)
 └── signatures.py (future)

Additional documentation in:

  • docs/design.md — internal architecture
  • docs/theory.md — math and algorithms
  • docs/usage.md — extended usage examples
  • docs/roadmap.md — future improvements

Roadmap

  • Adaptive-K optimization
  • Prototype compression
  • Distance signatures
  • GPU acceleration
  • Incremental learning support
  • Batch offline inference

License

This project is licensed under the MIT License. See LICENSE file.


Contributing

PRs, suggestions, and feature requests are welcome! If you like the project, star it on GitHub.


Support

Have issues or questions? Open an issue on GitHub or message your friendly AI assistant

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_knn-0.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smart_knn-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file smart_knn-0.1.0.tar.gz.

File metadata

  • Download URL: smart_knn-0.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smart_knn-0.1.0.tar.gz
Algorithm Hash digest
SHA256 665ed7603a464f32fa2fd71538a31b1b1cda78304ce7915a6bc456163b912472
MD5 736d61238fc9c0bc4fdbd405a3f86d52
BLAKE2b-256 7c54d9f7aa4486c06c1d1d267a31e779c5fd40a02b6bb4f95fb400b031b0298b

See more details on using hashes here.

File details

Details for the file smart_knn-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smart_knn-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for smart_knn-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 30c4aed5076ea06ab680ce55590c0add0420a678ddb91e841c7e507005235a3d
MD5 7bbf1825e8e1bc89cf590b756a051995
BLAKE2b-256 7b01ceffcf92f1d7578911173d947848200cd928980a37efcc28771c42a28bbb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page