Skip to main content

A from-scratch machine-learning library: regression, preprocessing, KNN, neural networks, diagnostics, and more.

Project description

mypackage-alina

A from-scratch machine-learning library for regression, preprocessing, neighbours, neural networks, decomposition, diagnostics, and more — implemented in pure NumPy for educational clarity and portfolio value.

Version: 0.1.0


Installation

# From PyPI
pip install alina-mypackage

# With optional visualization support
pip install "alina-mypackage[viz]"

# Editable / development install from source
# (replace the URL with your repository if you host it)
git clone https://github.com/yourusername/mypackage-alina.git
cd mypackage-alina
pip install -e ".[dev]"

Importing from mypackage

from mypackage import OLSRegression, StandardScaler, KNNRegressor, PCA

Features

Module Algorithms / Tools
regression OLS, Ridge (L2), Lasso (L1)
neighbors KNN Classifier, KNN Regressor
neural_networks Single-layer Perceptron (regression)
preprocessing StandardScaler, OneHotEncoder, TargetEncoder, MissingValueHandler, OutlierHandler, PolynomialFeatures, TargetTransformer, FeatureSelector
metrics MSE, RMSE, MAE, R²
model_selection train_test_split, CrossValidation (k-fold)
diagnostics VIF, NormalityTest, HeteroscedasticityTest
feature_selection ForwardSelection, BackwardElimination
decomposition PCA
visualization RegressionPlots (actual vs predicted, residuals, histogram)

Quick Start

import numpy as np
from mypackage import (
    OLSRegression,
    StandardScaler,
    train_test_split,
    mse, rmse, r2_score,
)

# --- Synthetic data ---
rng = np.random.default_rng(42)
X = rng.standard_normal((100, 3))
y = X @ np.array([2.0, -1.5, 0.8]) + rng.standard_normal(100) * 0.3

# --- Split ---
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- Preprocess ---
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# --- Fit ---
model = OLSRegression()
model.fit(X_train, y_train)

# --- Evaluate ---
y_pred = model.predict(X_test)
print(f"MSE:  {mse(y_test, y_pred):.4f}")
print(f"RMSE: {rmse(y_test, y_pred):.4f}")
print(f"R²:   {r2_score(y_test, y_pred):.4f}")

Algorithms in Detail

Regression

OLS (Ordinary Least Squares)

Solves the normal equation analytically using the Moore-Penrose pseudo-inverse for numerical stability.

from mypackage import OLSRegression
model = OLSRegression()
model.fit(X_train, y_train)
print(model.coef_, model.intercept_)

Ridge Regression (L2)

from mypackage import RidgeRegression
model = RidgeRegression(learning_rate=0.001, epochs=10_000, lambda_param=0.01)
model.fit(X_train, y_train)

Lasso Regression (L1)

from mypackage import LassoRegression
model = LassoRegression(learning_rate=0.001, epochs=10_000, lambda_param=0.01)
model.fit(X_train, y_train)

Preprocessing

from mypackage import (
    StandardScaler,
    MissingValueHandler,
    OutlierHandler,
    OneHotEncoder,
    PolynomialFeatures,
    TargetTransformer,
    FeatureSelector,
)

# Handle missing values
handler = MissingValueHandler(strategy="mean")
X_clean = handler.fit_transform(X_with_nans)

# Remove outliers (IQR fence)
oh = OutlierHandler()
X_no_outliers = oh.fit_transform(X)

# One-hot encode categorical columns
enc = OneHotEncoder()
X_encoded = enc.fit_transform(X_categorical)

# Polynomial features up to degree 2
pf = PolynomialFeatures(degree=2)
X_poly = pf.fit_transform(X)

# Log-transform target
tt = TargetTransformer(method="log1p")
y_log = tt.fit_transform(y)
y_original = tt.inverse_transform(y_log)

# Select top-2 features by correlation
fs = FeatureSelector(method="correlation", k=2)
X_selected = fs.fit_transform(X, y)

KNN

from mypackage import KNNClassifier, KNNRegressor

clf = KNNClassifier(k=5, distance="euclidean")
clf.fit(X_train, y_train_labels)
print(clf.score(X_test, y_test_labels))

reg = KNNRegressor(k=3, distance="manhattan")
reg.fit(X_train, y_train)
print(reg.score(X_test, y_test))

Neural Network — Perceptron

from mypackage import Perceptron

p = Perceptron(learning_rate=0.01, epochs=200, verbose=True, random_state=0)
p.fit(X_train, y_train)
preds = p.predict(X_test)

Model Selection

from mypackage import CrossValidation, OLSRegression

cv = CrossValidation(OLSRegression(), k=5)
result = cv.evaluate(X, y)
print(f"CV mean R²: {result['mean_score']:.4f}")

Diagnostics

from mypackage import VIF, NormalityTest, HeteroscedasticityTest

# Multicollinearity
vif_scores = VIF().calculate(X)
print("VIF:", vif_scores)

# Normality of residuals
nt = NormalityTest()
print(nt.summary(residuals))

# Heteroscedasticity
ht = HeteroscedasticityTest()
print(ht.variance_check(residuals))

Decomposition

from mypackage import PCA

pca = PCA(n_components=2)
X_reduced = pca.fit_transform(X)
print("Explained variance:", pca.explained_variance_)

Metrics Reference

Function Description
mse(y_true, y_pred) Mean Squared Error
rmse(y_true, y_pred) Root Mean Squared Error
mae(y_true, y_pred) Mean Absolute Error
r2_score(y_true, y_pred) Coefficient of determination R²

Running Tests

pip install pytest
pytest tests/ -v

Building & Publishing

# Build source distribution and wheel
python -m build

# Check distribution
twine check dist/*

# Upload to TestPyPI first
twine upload --repository testpypi dist/*

# Upload to PyPI
twine upload dist/*

Project Structure

mypackage-alina/
├── mypackage/
│   ├── __init__.py            ← public API
│   ├── decomposition/
│   │   └── pca.py
│   ├── diagnostics/
│   │   ├── heteroscedasticity.py
│   │   ├── multicollinearity.py
│   │   └── normality.py
│   ├── feature_selection/
│   │   ├── backward_elimination.py
│   │   └── forward_selection.py
│   ├── metrics/
│   │   └── regression_metrics.py
│   ├── model_selection/
│   │   ├── cross_validation.py
│   │   └── train_test_split.py
│   ├── neighbors/
│   │   ├── distances.py
│   │   ├── knn_classifier.py
│   │   └── knn_regressor.py
│   ├── neural_networks/
│   │   └── perceptron.py
│   ├── preprocessing/
│   │   ├── encoder.py
│   │   ├── feature_selection.py
│   │   ├── missing_values.py
│   │   ├── outliers.py
│   │   ├── polynomial.py
│   │   ├── scaler.py
│   │   └── target_transformer.py
│   ├── regression/
│   │   ├── lasso.py
│   │   ├── ols.py
│   │   └── ridge.py
│   └── visualization/
│       └── plots.py
├── datasets/                  ← external datasets (not shipped)
├── tests/
│   └── test_*.py
├── .gitignore
├── LICENSE
├── MANIFEST.in
├── README.md
├── pyproject.toml
└── setup.py

Future Improvements

  • Logistic regression and multi-class classification
  • Decision tree and random forest
  • Support Vector Machine (SVM)
  • Gradient boosting
  • Mini-batch gradient descent
  • Pipeline API (fit/transform chaining)
  • Cross-validated hyperparameter search
  • More formal statistical tests (Shapiro-Wilk, Breusch-Pagan, White)
  • Sparse matrix support

License

MIT © 2026 Alina

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

alina_mypackage-0.1.0.tar.gz (32.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

alina_mypackage-0.1.0-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file alina_mypackage-0.1.0.tar.gz.

File metadata

  • Download URL: alina_mypackage-0.1.0.tar.gz
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for alina_mypackage-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e262f17163368eef4fc4068f250babd2c685c84b6e424c88bbc8c5fc762e3c7f
MD5 ce4394b2fe5f97984c64d09d9cbbf056
BLAKE2b-256 470259df54b628b5ec701b203359664b90ac01f9deeb8a341ff048ea70289e31

See more details on using hashes here.

File details

Details for the file alina_mypackage-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for alina_mypackage-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9835e1fba6c8ec636dea9f18deb74eec838f1be6fa3e5916fe29f6bb92a5a167
MD5 5a007c25134d53f8afa012d9ae45fa52
BLAKE2b-256 b4f8d419c7f1570c4dab7c639bb3e5b843780aa270189e6ef1f65aa3b644b49e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page