A from-scratch machine-learning library: regression, preprocessing, KNN, neural networks, diagnostics, and more.
Project description
mypackage-alina
A from-scratch machine-learning library for regression, preprocessing, neighbours, neural networks, decomposition, diagnostics, and more — implemented in pure NumPy for educational clarity and portfolio value.
Version: 0.1.0
Installation
# From PyPI
pip install alina-mypackage
# With optional visualization support
pip install "alina-mypackage[viz]"
# Editable / development install from source
# (replace the URL with your repository if you host it)
git clone https://github.com/yourusername/mypackage-alina.git
cd mypackage-alina
pip install -e ".[dev]"
Importing from mypackage
from mypackage import OLSRegression, StandardScaler, KNNRegressor, PCA
Features
| Module | Algorithms / Tools |
|---|---|
| regression | OLS, Ridge (L2), Lasso (L1) |
| neighbors | KNN Classifier, KNN Regressor |
| neural_networks | Single-layer Perceptron (regression) |
| preprocessing | StandardScaler, OneHotEncoder, TargetEncoder, MissingValueHandler, OutlierHandler, PolynomialFeatures, TargetTransformer, FeatureSelector |
| metrics | MSE, RMSE, MAE, R² |
| model_selection | train_test_split, CrossValidation (k-fold) |
| diagnostics | VIF, NormalityTest, HeteroscedasticityTest |
| feature_selection | ForwardSelection, BackwardElimination |
| decomposition | PCA |
| visualization | RegressionPlots (actual vs predicted, residuals, histogram) |
Quick Start
import numpy as np
from mypackage import (
OLSRegression,
StandardScaler,
train_test_split,
mse, rmse, r2_score,
)
# --- Synthetic data ---
rng = np.random.default_rng(42)
X = rng.standard_normal((100, 3))
y = X @ np.array([2.0, -1.5, 0.8]) + rng.standard_normal(100) * 0.3
# --- Split ---
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# --- Preprocess ---
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# --- Fit ---
model = OLSRegression()
model.fit(X_train, y_train)
# --- Evaluate ---
y_pred = model.predict(X_test)
print(f"MSE: {mse(y_test, y_pred):.4f}")
print(f"RMSE: {rmse(y_test, y_pred):.4f}")
print(f"R²: {r2_score(y_test, y_pred):.4f}")
Algorithms in Detail
Regression
OLS (Ordinary Least Squares)
Solves the normal equation analytically using the Moore-Penrose pseudo-inverse for numerical stability.
from mypackage import OLSRegression
model = OLSRegression()
model.fit(X_train, y_train)
print(model.coef_, model.intercept_)
Ridge Regression (L2)
from mypackage import RidgeRegression
model = RidgeRegression(learning_rate=0.001, epochs=10_000, lambda_param=0.01)
model.fit(X_train, y_train)
Lasso Regression (L1)
from mypackage import LassoRegression
model = LassoRegression(learning_rate=0.001, epochs=10_000, lambda_param=0.01)
model.fit(X_train, y_train)
Preprocessing
from mypackage import (
StandardScaler,
MissingValueHandler,
OutlierHandler,
OneHotEncoder,
PolynomialFeatures,
TargetTransformer,
FeatureSelector,
)
# Handle missing values
handler = MissingValueHandler(strategy="mean")
X_clean = handler.fit_transform(X_with_nans)
# Remove outliers (IQR fence)
oh = OutlierHandler()
X_no_outliers = oh.fit_transform(X)
# One-hot encode categorical columns
enc = OneHotEncoder()
X_encoded = enc.fit_transform(X_categorical)
# Polynomial features up to degree 2
pf = PolynomialFeatures(degree=2)
X_poly = pf.fit_transform(X)
# Log-transform target
tt = TargetTransformer(method="log1p")
y_log = tt.fit_transform(y)
y_original = tt.inverse_transform(y_log)
# Select top-2 features by correlation
fs = FeatureSelector(method="correlation", k=2)
X_selected = fs.fit_transform(X, y)
KNN
from mypackage import KNNClassifier, KNNRegressor
clf = KNNClassifier(k=5, distance="euclidean")
clf.fit(X_train, y_train_labels)
print(clf.score(X_test, y_test_labels))
reg = KNNRegressor(k=3, distance="manhattan")
reg.fit(X_train, y_train)
print(reg.score(X_test, y_test))
Neural Network — Perceptron
from mypackage import Perceptron
p = Perceptron(learning_rate=0.01, epochs=200, verbose=True, random_state=0)
p.fit(X_train, y_train)
preds = p.predict(X_test)
Model Selection
from mypackage import CrossValidation, OLSRegression
cv = CrossValidation(OLSRegression(), k=5)
result = cv.evaluate(X, y)
print(f"CV mean R²: {result['mean_score']:.4f}")
Diagnostics
from mypackage import VIF, NormalityTest, HeteroscedasticityTest
# Multicollinearity
vif_scores = VIF().calculate(X)
print("VIF:", vif_scores)
# Normality of residuals
nt = NormalityTest()
print(nt.summary(residuals))
# Heteroscedasticity
ht = HeteroscedasticityTest()
print(ht.variance_check(residuals))
Decomposition
from mypackage import PCA
pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print("Explained variance:", pca.explained_variance_)
Metrics Reference
| Function | Description |
|---|---|
mse(y_true, y_pred) |
Mean Squared Error |
rmse(y_true, y_pred) |
Root Mean Squared Error |
mae(y_true, y_pred) |
Mean Absolute Error |
r2_score(y_true, y_pred) |
Coefficient of determination R² |
Running Tests
pip install pytest
pytest tests/ -v
Building & Publishing
# Build source distribution and wheel
python -m build
# Check distribution
twine check dist/*
# Upload to TestPyPI first
twine upload --repository testpypi dist/*
# Upload to PyPI
twine upload dist/*
Project Structure
mypackage-alina/
├── mypackage/
│ ├── __init__.py ← public API
│ ├── decomposition/
│ │ └── pca.py
│ ├── diagnostics/
│ │ ├── heteroscedasticity.py
│ │ ├── multicollinearity.py
│ │ └── normality.py
│ ├── feature_selection/
│ │ ├── backward_elimination.py
│ │ └── forward_selection.py
│ ├── metrics/
│ │ └── regression_metrics.py
│ ├── model_selection/
│ │ ├── cross_validation.py
│ │ └── train_test_split.py
│ ├── neighbors/
│ │ ├── distances.py
│ │ ├── knn_classifier.py
│ │ └── knn_regressor.py
│ ├── neural_networks/
│ │ └── perceptron.py
│ ├── preprocessing/
│ │ ├── encoder.py
│ │ ├── feature_selection.py
│ │ ├── missing_values.py
│ │ ├── outliers.py
│ │ ├── polynomial.py
│ │ ├── scaler.py
│ │ └── target_transformer.py
│ ├── regression/
│ │ ├── lasso.py
│ │ ├── ols.py
│ │ └── ridge.py
│ └── visualization/
│ └── plots.py
├── datasets/ ← external datasets (not shipped)
├── tests/
│ └── test_*.py
├── .gitignore
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
└── setup.py
Future Improvements
- Logistic regression and multi-class classification
- Decision tree and random forest
- Support Vector Machine (SVM)
- Gradient boosting
- Mini-batch gradient descent
- Pipeline API (fit/transform chaining)
- Cross-validated hyperparameter search
- More formal statistical tests (Shapiro-Wilk, Breusch-Pagan, White)
- Sparse matrix support
License
MIT © 2026 Alina
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file alina_mypackage-0.1.0.tar.gz.
File metadata
- Download URL: alina_mypackage-0.1.0.tar.gz
- Upload date:
- Size: 32.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e262f17163368eef4fc4068f250babd2c685c84b6e424c88bbc8c5fc762e3c7f
|
|
| MD5 |
ce4394b2fe5f97984c64d09d9cbbf056
|
|
| BLAKE2b-256 |
470259df54b628b5ec701b203359664b90ac01f9deeb8a341ff048ea70289e31
|
File details
Details for the file alina_mypackage-0.1.0-py3-none-any.whl.
File metadata
- Download URL: alina_mypackage-0.1.0-py3-none-any.whl
- Upload date:
- Size: 47.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9835e1fba6c8ec636dea9f18deb74eec838f1be6fa3e5916fe29f6bb92a5a167
|
|
| MD5 |
5a007c25134d53f8afa012d9ae45fa52
|
|
| BLAKE2b-256 |
b4f8d419c7f1570c4dab7c639bb3e5b843780aa270189e6ef1f65aa3b644b49e
|