A machine learning library implementing algorithms from scratch
Project description
VishuML
A comprehensive machine learning library implementing fundamental algorithms from scratch in Python. This library provides educational implementations of popular ML algorithms without relying on external ML frameworks like scikit-learn.
Features
🎯 sklearn-compatible API - Works seamlessly with pandas DataFrames and CSV data!
VishuML implements the following machine learning algorithms:
Supervised Learning
- Linear Regression - For continuous target prediction
- Logistic Regression - For binary classification
- K-Nearest Neighbors (KNN) - For classification and regression
- Support Vector Machine (SVM) - For binary classification with linear and RBF kernels
- Decision Tree - For classification using CART algorithm
- Naive Bayes - Gaussian Naive Bayes for classification
- Perceptron - Linear binary classifier
Unsupervised Learning
- K-Means Clustering - For data clustering
Utilities
- Data splitting (train/test split)
- Evaluation metrics (accuracy, R², MSE)
- Distance functions
- Data normalization
- Confusion matrix
Installation
From PyPI (when published)
pip install vishuml
From Source
git clone https://github.com/vishuRizz/vishuml.git
cd vishuml
pip install -e .
Quick Start
🚀 Works with pandas DataFrames (Just like sklearn!)
import pandas as pd
from vishuml import LinearRegression, LogisticRegression
from vishuml.utils import train_test_split, r2_score, accuracy_score
# Load your CSV data (just like sklearn!)
df = pd.read_csv('your_data.csv')
X = df[['feature1', 'feature2', 'feature3']] # Select features
y = df['target'] # Select target
# Train-test split (works with DataFrames!)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train model (accepts DataFrames!)
model = LinearRegression()
model.fit(X_train, y_train) # DataFrame input!
# Make predictions (works with DataFrames!)
predictions = model.predict(X_test)
score = model.score(X_test, y_test)
print(f"R² Score: {score:.4f}")
# Classification Example with real data
from vishuml import datasets as ds
X, y = ds.load_iris()
# Convert to DataFrame for realistic workflow
iris_df = pd.DataFrame(X, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
iris_df['species'] = y
# sklearn-like feature selection
features = iris_df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
target = (iris_df['species'] == 0).astype(int) # Binary classification
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
classifier = LogisticRegression()
classifier.fit(X_train, y_train) # DataFrame input!
accuracy = classifier.score(X_test, y_test)
print(f"Accuracy: {accuracy:.4f}")
Traditional NumPy Arrays
import numpy as np
from vishuml import LinearRegression, KMeans
# NumPy arrays also work (backward compatibility)
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 6, 8, 10])
model = LinearRegression()
model.fit(X, y)
predictions = model.predict([[6], [7]])
print(f"Predictions: {predictions}") # Should be close to [12, 14]
# Clustering Example
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
kmeans = KMeans(k=2, random_state=42)
clusters = kmeans.fit_predict(X)
print(f"Cluster labels: {clusters}")
Algorithm Documentation
Linear Regression
from vishuml import LinearRegression
# Create and train model
model = LinearRegression(fit_intercept=True)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
# Get R² score
score = model.score(X_test, y_test)
Logistic Regression
from vishuml import LogisticRegression
# Create and train model
model = LogisticRegression(learning_rate=0.01, max_iterations=1000)
model.fit(X_train, y_train)
# Make predictions
predictions = model.predict(X_test)
probabilities = model.predict_proba(X_test)
# Get accuracy
accuracy = model.score(X_test, y_test)
K-Nearest Neighbors
from vishuml import KNearestNeighbors
# For classification
knn_clf = KNearestNeighbors(k=3, task_type='classification')
knn_clf.fit(X_train, y_train)
predictions = knn_clf.predict(X_test)
# For regression
knn_reg = KNearestNeighbors(k=5, task_type='regression')
knn_reg.fit(X_train, y_train)
predictions = knn_reg.predict(X_test)
Support Vector Machine
from vishuml import SupportVectorMachine
# Linear SVM
svm_linear = SupportVectorMachine(C=1.0, kernel='linear')
svm_linear.fit(X_train, y_train)
# RBF SVM
svm_rbf = SupportVectorMachine(C=1.0, kernel='rbf', gamma=1.0)
svm_rbf.fit(X_train, y_train)
predictions = svm_rbf.predict(X_test)
decision_scores = svm_rbf.decision_function(X_test)
Decision Tree
from vishuml import DecisionTree
# Create and train model
tree = DecisionTree(max_depth=5, min_samples_split=2, min_samples_leaf=1)
tree.fit(X_train, y_train)
# Make predictions
predictions = tree.predict(X_test)
accuracy = tree.score(X_test, y_test)
Naive Bayes
from vishuml import NaiveBayes
# Create and train model
nb = NaiveBayes()
nb.fit(X_train, y_train)
# Make predictions
predictions = nb.predict(X_test)
probabilities = nb.predict_proba(X_test)
Perceptron
from vishuml import Perceptron
# Create and train model
perceptron = Perceptron(learning_rate=0.01, max_iterations=1000)
perceptron.fit(X_train, y_train)
# Make predictions
predictions = perceptron.predict(X_test)
decision_scores = perceptron.decision_function(X_test)
K-Means Clustering
from vishuml import KMeans
# Create and train model
kmeans = KMeans(k=3, init='k-means++', random_state=42)
kmeans.fit(X)
# Get cluster labels
labels = kmeans.labels
# Or predict for new data
new_labels = kmeans.predict(X_new)
# Transform to distance space
distances = kmeans.transform(X)
Utility Functions
from vishuml.utils import (
train_test_split, accuracy_score, r2_score,
mean_squared_error, euclidean_distance,
normalize, confusion_matrix
)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Evaluate predictions
accuracy = accuracy_score(y_true, y_pred)
r2 = r2_score(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
# Normalize features
X_normalized = normalize(X)
# Confusion matrix
cm = confusion_matrix(y_true, y_pred)
Sample Datasets
The library includes sample datasets in CSV format:
datasets/iris.csv- Classic iris flower classification datasetdatasets/housing.csv- Housing price regression datasetdatasets/wine.csv- Wine quality classification dataset
import pandas as pd
import os
# Load sample datasets
iris_data = pd.read_csv('datasets/iris.csv')
housing_data = pd.read_csv('datasets/housing.csv')
wine_data = pd.read_csv('datasets/wine.csv')
Examples
Check out the examples/ directory for Jupyter notebook tutorials demonstrating each algorithm:
examples/linear_regression_example.ipynbexamples/logistic_regression_example.ipynbexamples/knn_example.ipynbexamples/svm_example.ipynbexamples/decision_tree_example.ipynbexamples/naive_bayes_example.ipynbexamples/perceptron_example.ipynbexamples/kmeans_example.ipynb
Development
Setup Development Environment
git clone https://github.com/vishuRizz/vishuml.git
cd vishuml
pip install -e ".[dev]"
Running Tests
pytest tests/ -v --cov=vishuml
Code Formatting
black vishuml/
flake8 vishuml/
Requirements
- Python >= 3.7
- NumPy >= 1.19.0
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Educational Purpose
This library is designed for educational purposes to help understand how machine learning algorithms work under the hood. For production use, consider using mature libraries like scikit-learn, which are more optimized and feature-complete.
Author
Vishu - GitHub Profile
Acknowledgments
- Inspired by scikit-learn's API design
- Algorithms implemented based on standard textbook descriptions
- Built for educational and learning purposes
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vishuml-0.1.2.tar.gz.
File metadata
- Download URL: vishuml-0.1.2.tar.gz
- Upload date:
- Size: 26.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32cb6c482274642e796a978bf076e7945d1fe4f9ad97ac408702fc25dedacb9f
|
|
| MD5 |
b6a4101835215ee4d161412d052259e3
|
|
| BLAKE2b-256 |
4d2abd4c24252195d563659023a9a01251d8c0dcf78767d58e2064f83655f5ad
|
File details
Details for the file vishuml-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vishuml-0.1.2-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4cf9a94742b0dddca778b362691cc413fa7dec284f1e2a662c7e288f6752d472
|
|
| MD5 |
556377fc15003afb70e8b26929d710f1
|
|
| BLAKE2b-256 |
e413ddc665cb077d46ca7003fc06eefbc4e1cdfd951080c72b03d76ed1d3cd73
|