Skip to main content

A machine learning library implementing algorithms from scratch

Project description

VishuML

A comprehensive machine learning library implementing fundamental algorithms from scratch in Python. This library provides educational implementations of popular ML algorithms without relying on external ML frameworks like scikit-learn.

Features

🎯 sklearn-compatible API - Works seamlessly with pandas DataFrames and CSV data!

VishuML implements the following machine learning algorithms:

Supervised Learning

  • Linear Regression - For continuous target prediction
  • Logistic Regression - For binary classification
  • K-Nearest Neighbors (KNN) - For classification and regression
  • Support Vector Machine (SVM) - For binary classification with linear and RBF kernels
  • Decision Tree - For classification using CART algorithm
  • Naive Bayes - Gaussian Naive Bayes for classification
  • Perceptron - Linear binary classifier

Unsupervised Learning

  • K-Means Clustering - For data clustering

Utilities

  • Data splitting (train/test split)
  • Evaluation metrics (accuracy, R², MSE)
  • Distance functions
  • Data normalization
  • Confusion matrix

Installation

From PyPI (when published)

pip install vishuml

From Source

git clone https://github.com/vishuRizz/vishuml.git
cd vishuml
pip install -e .

Quick Start

🚀 Works with pandas DataFrames (Just like sklearn!)

import pandas as pd
from vishuml import LinearRegression, LogisticRegression
from vishuml.utils import train_test_split, r2_score, accuracy_score

# Load your CSV data (just like sklearn!)
df = pd.read_csv('your_data.csv')
X = df[['feature1', 'feature2', 'feature3']]  # Select features
y = df['target']                               # Select target

# Train-test split (works with DataFrames!)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model (accepts DataFrames!)
model = LinearRegression()
model.fit(X_train, y_train)  # DataFrame input!

# Make predictions (works with DataFrames!)
predictions = model.predict(X_test)
score = model.score(X_test, y_test)
print(f"R² Score: {score:.4f}")

# Classification Example with real data
from vishuml import datasets as ds
X, y = ds.load_iris()

# Convert to DataFrame for realistic workflow
iris_df = pd.DataFrame(X, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
iris_df['species'] = y

# sklearn-like feature selection
features = iris_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
target = (iris_df['species'] == 0).astype(int)  # Binary classification

X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)

classifier = LogisticRegression()
classifier.fit(X_train, y_train)  # DataFrame input!
accuracy = classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.4f}")

Traditional NumPy Arrays

import numpy as np
from vishuml import LinearRegression, KMeans

# NumPy arrays also work (backward compatibility)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])

model = LinearRegression()
model.fit(X, y)
predictions = model.predict([[6], [7]])
print(f"Predictions: {predictions}")  # Should be close to [12, 14]

# Clustering Example
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(k=2, random_state=42)
clusters = kmeans.fit_predict(X)
print(f"Cluster labels: {clusters}")

Algorithm Documentation

Linear Regression

from vishuml import LinearRegression

# Create and train model
model = LinearRegression(fit_intercept=True)
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)

# Get R² score
score = model.score(X_test, y_test)

Logistic Regression

from vishuml import LogisticRegression

# Create and train model
model = LogisticRegression(learning_rate=0.01, max_iterations=1000)
model.fit(X_train, y_train)

# Make predictions
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)

# Get accuracy
accuracy = model.score(X_test, y_test)

K-Nearest Neighbors

from vishuml import KNearestNeighbors

# For classification
knn_clf = KNearestNeighbors(k=3, task_type='classification')
knn_clf.fit(X_train, y_train)
predictions = knn_clf.predict(X_test)

# For regression
knn_reg = KNearestNeighbors(k=5, task_type='regression')
knn_reg.fit(X_train, y_train)
predictions = knn_reg.predict(X_test)

Support Vector Machine

from vishuml import SupportVectorMachine

# Linear SVM
svm_linear = SupportVectorMachine(C=1.0, kernel='linear')
svm_linear.fit(X_train, y_train)

# RBF SVM
svm_rbf = SupportVectorMachine(C=1.0, kernel='rbf', gamma=1.0)
svm_rbf.fit(X_train, y_train)

predictions = svm_rbf.predict(X_test)
decision_scores = svm_rbf.decision_function(X_test)

Decision Tree

from vishuml import DecisionTree

# Create and train model
tree = DecisionTree(max_depth=5, min_samples_split=2, min_samples_leaf=1)
tree.fit(X_train, y_train)

# Make predictions
predictions = tree.predict(X_test)
accuracy = tree.score(X_test, y_test)

Naive Bayes

from vishuml import NaiveBayes

# Create and train model
nb = NaiveBayes()
nb.fit(X_train, y_train)

# Make predictions
predictions = nb.predict(X_test)
probabilities = nb.predict_proba(X_test)

Perceptron

from vishuml import Perceptron

# Create and train model
perceptron = Perceptron(learning_rate=0.01, max_iterations=1000)
perceptron.fit(X_train, y_train)

# Make predictions
predictions = perceptron.predict(X_test)
decision_scores = perceptron.decision_function(X_test)

K-Means Clustering

from vishuml import KMeans

# Create and train model
kmeans = KMeans(k=3, init='k-means++', random_state=42)
kmeans.fit(X)

# Get cluster labels
labels = kmeans.labels
# Or predict for new data
new_labels = kmeans.predict(X_new)

# Transform to distance space
distances = kmeans.transform(X)

Utility Functions

from vishuml.utils import (
    train_test_split, accuracy_score, r2_score,
    mean_squared_error, euclidean_distance,
    normalize, confusion_matrix
)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Evaluate predictions
accuracy = accuracy_score(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)

# Normalize features
X_normalized = normalize(X)

# Confusion matrix
cm = confusion_matrix(y_true, y_pred)

Sample Datasets

The library includes sample datasets in CSV format:

  • datasets/iris.csv - Classic iris flower classification dataset
  • datasets/housing.csv - Housing price regression dataset
  • datasets/wine.csv - Wine quality classification dataset
import pandas as pd
import os

# Load sample datasets
iris_data = pd.read_csv('datasets/iris.csv')
housing_data = pd.read_csv('datasets/housing.csv')
wine_data = pd.read_csv('datasets/wine.csv')

Examples

Check out the examples/ directory for Jupyter notebook tutorials demonstrating each algorithm:

  • examples/linear_regression_example.ipynb
  • examples/logistic_regression_example.ipynb
  • examples/knn_example.ipynb
  • examples/svm_example.ipynb
  • examples/decision_tree_example.ipynb
  • examples/naive_bayes_example.ipynb
  • examples/perceptron_example.ipynb
  • examples/kmeans_example.ipynb

Development

Setup Development Environment

git clone https://github.com/vishuRizz/vishuml.git
cd vishuml
pip install -e ".[dev]"

Running Tests

pytest tests/ -v --cov=vishuml

Code Formatting

black vishuml/
flake8 vishuml/

Requirements

  • Python >= 3.7
  • NumPy >= 1.19.0

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Educational Purpose

This library is designed for educational purposes to help understand how machine learning algorithms work under the hood. For production use, consider using mature libraries like scikit-learn, which are more optimized and feature-complete.

Author

Vishu - GitHub Profile

Acknowledgments

  • Inspired by scikit-learn's API design
  • Algorithms implemented based on standard textbook descriptions
  • Built for educational and learning purposes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vishuml-0.1.8.tar.gz (40.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vishuml-0.1.8-py3-none-any.whl (46.7 kB view details)

Uploaded Python 3

File details

Details for the file vishuml-0.1.8.tar.gz.

File metadata

  • Download URL: vishuml-0.1.8.tar.gz
  • Upload date:
  • Size: 40.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vishuml-0.1.8.tar.gz
Algorithm Hash digest
SHA256 81b27fab8111c3d6df0b18a2244d41f09188bc1c30753c66a5fdd788b61e9755
MD5 7668247610d4eb3078c7e1cc544736be
BLAKE2b-256 3e651f20e416c51726a32360de9c8b983518e4cbe6ede776abd5a36169c29931

See more details on using hashes here.

File details

Details for the file vishuml-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: vishuml-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 46.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for vishuml-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c50425328d0ffcbb4ee439a6828b53a3efdabf0743c2cb41c9c235cd61d81448
MD5 b75f388818b1fa9d95fbccefc8ff8382
BLAKE2b-256 20f7e81060eb7a0ab9703a1b98c6d9a8dd97fda647a30764d1546ebf7f7ca3df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page