Basic ML algorithms library built from scratch (KNN + Linear Regression)

Project description

CoreLearn

A lightweight Python machine learning library built from scratch using only NumPy.
Implements KNN classification and Linear Regression.

Installation

# Download the project using pip:
pip install coreLearn

After installation, import from anywhere:

from coreLearn import KNNClassifier, LinearRegression, Evaluator

Quick Start

from coreLearn import KNNClassifier, LinearRegression, Evaluator, accuracy, mae

# --- KNN Classification ---
knn = KNNClassifier(k=5, distance="euclidean", n_jobs=2)
knn.fit(X_train, y_train)
predictions = knn.predict(X_test)
print(accuracy(y_test, predictions))

# --- Linear Regression ---
lr = LinearRegression(strategy="normal")
lr.fit(X_train, y_train)
predictions = lr.predict(X_test)
print(mae(y_test, predictions))

# --- Evaluator ---
print(Evaluator.evaluate_regression(y_test, predictions))
# {'mae': ..., 'mse': ..., 'rmse': ...}

print(Evaluator.evaluate_classification(y_test, knn_preds))
# {'accuracy': ..., 'precision': ..., 'recall': ..., 'f1': ...}

Package Structure

coreLearn/
├── __init__.py          ← Public API
├── base.py              ← Abstract base class — Template Method Pattern
├── distances.py         ← Distance metrics — Factory Pattern
├── knn.py               ← KNN Classifier — Recursion + Concurrency + OOP
├── linear_regression.py ← Linear Regression — Strategy Pattern + OOP
├── evaluator.py         ← Metric engine — Functional Programming
├── examples/
│   ├── demo_notebook.ipynb
│   ├── housing.csv
│   └── penguin.csv
└── tests/
    ├── test_knn.py
    ├── test_linear_regression.py
    ├── test_distances.py
    └── test_evaluator.py

Running Tests

cd coreLearn/
pytest coreLearn/tests/ -v

Learning Outcomes

1 — Object-Oriented Programming (OOP)

File: base.py, knn.py, linear_regression.py, distances.py

Abstract Base Class & Inheritance

BaseModel is an abstract class that defines the contract every model must follow.
KNNClassifier and LinearRegression both inherit from it:

# base.py
class BaseModel(ABC):
    @abstractmethod
    def fit(self, X, y) -> "BaseModel": ...

    @abstractmethod
    def predict(self, X) -> list: ...

# knn.py
class KNNClassifier(BaseModel):   # ← inheritance
    def fit(self, X, y): ...
    def predict(self, X): ...

# linear_regression.py
class LinearRegression(BaseModel):  # ← inheritance
    def fit(self, X, y): ...
    def predict(self, X): ...

Polymorphism

Both models share the same interface — they can be used interchangeably:

for model in [KNNClassifier(k=3), LinearRegression()]:
    model.fit(X_train, y_train)   # same call, different behaviour
    model.predict(X_test)         # same call, different behaviour

Encapsulation

Internal state is hidden with _ prefixes. Users interact only through the public API:

# knn.py
self._metric = DistanceMetricFactory.create(distance)  # private
self._tree   = None                                     # private

# linear_regression.py — controlled read access via properties
@property
def coef_(self) -> np.ndarray:
    return self._weights[1:]

@property
def intercept_(self) -> float:
    return float(self._weights[0])

OptimizationStrategy, NormalEquationStrategy, and GradientDescentStrategy inside
linear_regression.py form an additional hierarchy demonstrating inheritance within the library.

2 — Functional Programming

File: evaluator.py

Functions as First-Class Objects

Metric functions are stored in dictionaries as values and called dynamically:

# evaluator.py
_regression_metrics: dict[str, callable] = {
    "mae":  mae,
    "mse":  mse,
    "rmse": rmse,
}

@classmethod
def evaluate_regression(cls, y_true, y_pred) -> dict:
    # applies every registered function — no if/elif chain
    return {name: fn(y_true, y_pred) for name, fn in cls._regression_metrics.items()}

Higher-Order Function — `register()`

Evaluator.register() accepts any callable and plugs it in at runtime.
This is the classic higher-order function pattern: a function (or method) that takes another function as an argument.

# Add a custom metric without modifying the Evaluator class
Evaluator.register(
    "max_error",
    lambda y_true, y_pred: max(abs(a - b) for a, b in zip(y_true, y_pred)),
    kind="regression",
)
result = Evaluator.evaluate_regression(y_test, y_pred)
print(result["max_error"])   # available immediately

Pure Functions

mae, mse, rmse, accuracy, precision, recall, f1_score are all pure functions:

No side effects
No mutation of inputs
Same inputs always produce the same output

from coreLearn import mae, accuracy
mae([1.0, 2.0, 3.0], [1.5, 2.5, 3.5])   # → 0.5  (always)
accuracy([0, 1, 1], [0, 1, 0])           # → 0.666 (always)

3 — Concurrency

File: knn.py — KNNClassifier.predict()

KNNClassifier uses ProcessPoolExecutor to classify test samples in parallel across
multiple CPU processes. Unlike threads, each worker runs in its own process with its
own GIL — enabling true CPU-bound parallelism.

# knn.py
def predict(self, X) -> list:
    ...
    if self.n_jobs == 1:
        # sequential — no overhead for small datasets
        return [self._predict_one(x) for x in samples]

    # parallel — distribute samples across n_jobs worker processes
    args = [(self._tree, x, self.k, self._metric) for x in samples]
    with ProcessPoolExecutor(max_workers=self.n_jobs) as executor:
        return list(executor.map(_predict_worker, args))

# n_jobs=1  → sequential (default, safe for notebooks)
knn = KNNClassifier(k=5, n_jobs=1)

# n_jobs=4  → 4 parallel worker processes
knn = KNNClassifier(k=5, n_jobs=4)
knn.fit(X_train, y_train)
preds = knn.predict(X_test)

Note: ProcessPoolExecutor requires the if __name__ == "__main__": guard on
Windows/macOS when used in scripts. The n_jobs=1 default is safe everywhere.

4 — Recursion

File: knn.py — KDTree

The KD-Tree data structure is built and searched using mutual recursion.
Both _build and _search call themselves with a strictly smaller subproblem each time.

`_build` — Recursive Tree Construction

Base case: empty data → return None.
Recursive case: split on the median, call _build on each half with depth + 1.

# knn.py
def _build(self, data: list, depth: int):
    if not data:          # ← base case
        return None
    axis = depth % len(data[0][0])
    data.sort(key=lambda item: item[0][axis])
    mid = len(data) // 2
    return KDNode(
        point = data[mid][0],
        label = data[mid][1],
        left  = self._build(data[:mid],     depth + 1),  # ← recursion
        right = self._build(data[mid + 1:], depth + 1),  # ← recursion
    )

`_search` — Recursive Nearest-Neighbour Search

Base case: node is None → return.
Recursive case: visit the near branch, then prune and optionally visit the far branch.

# knn.py
def _search(self, node, target, k, metric, depth, best):
    if node is None:      # ← base case
        return
    dist = metric(target, node.point)
    # update best list ...
    self._search(near, target, k, metric, depth + 1, best)  # ← recursion
    if len(best) < k or abs(diff) < best[-1][0]:
        self._search(far, target, k, metric, depth + 1, best)  # ← recursion (pruned)

Pruning: the abs(diff) < best[-1][0] condition skips the far branch when it cannot
contain a closer neighbour — achieving O(log n) average search complexity.

5 — SOLID Principles

Files: all modules

S — Single Responsibility

Every class has exactly one reason to change:

Class	Sole Responsibility
`BaseModel`	Define the common model contract
`KDTree`	Spatial nearest-neighbour search
`KNNClassifier`	KNN classification logic
`LinearRegression`	Linear regression logic
`NormalEquationStrategy`	Closed-form weight computation
`GradientDescentStrategy`	Iterative gradient-based weight computation
`DistanceMetricFactory`	Instantiate distance metric objects by name
`Evaluator`	Compute and manage evaluation metrics

O — Open/Closed

Classes are open for extension, closed for modification.
New metrics and distance functions can be added without editing any existing class:

# Add a new metric — Evaluator source code untouched
Evaluator.register("r2", lambda t, p: ..., kind="regression")

# Add a new distance — KNNClassifier source code untouched
DistanceMetricFactory.register("chebyshev", ChebyshevDistance)
knn = KNNClassifier(k=3, distance="chebyshev")

L — Liskov Substitution

Any BaseModel subclass can replace BaseModel without breaking callers:

def train_and_score(model: BaseModel, X_train, y_train, X_test, y_test):
    preds = model.fit_predict(X_train, y_train, X_test)
    return accuracy(y_test, preds)

train_and_score(KNNClassifier(k=3), ...)   # works
train_and_score(LinearRegression(), ...)   # works

I — Interface Segregation

DistanceMetric exposes only what is needed — a single compute() method.
Implementors are not forced to implement anything they do not use:

# distances.py
class DistanceMetric(ABC):
    @abstractmethod
    def compute(self, a: list, b: list) -> float: ...
    # nothing else required

D — Dependency Inversion

LinearRegression depends on the abstraction OptimizationStrategy,
not on any concrete strategy class:

# linear_regression.py
self._weights = self._strategy.fit(X_b, y)
#               ↑ OptimizationStrategy interface — concrete class unknown here

6 — Architectural & Design Patterns

Core layer (base.py, distances.py): abstractions and shared contracts
Algorithm layer (knn.py, linear_regression.py): concrete ML algorithms
Evaluation layer (evaluator.py): metric computation
Public API (__init__.py): single entry point, re-exports everything

Pattern 1 — Template Method (`base.py`)

fit_predict defines the fixed skeleton (fit → predict).
Subclasses fill in each step without altering the sequence:

# base.py
def fit_predict(self, X_train, y_train, X_test) -> list:
    self.fit(X_train, y_train)   # ← step 1: implemented by subclass
    return self.predict(X_test)  # ← step 2: implemented by subclass

Every model gets fit_predict for free through inheritance.

Pattern 2 — Strategy (`linear_regression.py`)

The optimisation algorithm is swapped at construction time.
LinearRegression.fit() never knows which concrete strategy it is using:

lr_ne = LinearRegression(strategy="normal")           # uses NormalEquationStrategy
lr_gd = LinearRegression(strategy="gradient_descent") # uses GradientDescentStrategy

# Both models have the same interface — caller code is identical
lr_ne.fit(X_train, y_train)
lr_gd.fit(X_train, y_train)

To add a third optimiser (e.g. Adam), only a new OptimizationStrategy subclass is needed.

Pattern 3 — Factory (`distances.py`)

DistanceMetricFactory centralises object creation.
KNNClassifier never imports EuclideanDistance or ManhattanDistance directly:

# distances.py
class DistanceMetricFactory:
    _registry = {"euclidean": EuclideanDistance, "manhattan": ManhattanDistance}

    @classmethod
    def create(cls, name: str) -> DistanceMetric:
        return cls._registry[name]()   # create and return

    @classmethod
    def register(cls, name: str, metric_class: type) -> None:
        cls._registry[name] = metric_class  # extend without modifying

# knn.py — only depends on the factory, not the concrete classes
self._metric = DistanceMetricFactory.create(distance)

API Reference

`KNNClassifier`

Parameter	Type	Default	Description
`k`	`int`	`5`	Number of neighbours
`distance`	`str`	`"euclidean"`	`"euclidean"` or `"manhattan"` (or any registered name)
`n_jobs`	`int`	`1`	Worker processes for prediction (`1` = sequential)

`LinearRegression`

Parameter	Type	Default	Description
`strategy`	`str`	`"normal"`	`"normal"` (closed-form) or `"gradient_descent"`
`learning_rate`	`float`	`0.01`	Learning rate — gradient descent only
`epochs`	`int`	`1000`	Iterations — gradient descent only

`Evaluator`

Method	Description
`evaluate_regression(y_true, y_pred)`	Returns `{"mae", "mse", "rmse"}`
`evaluate_classification(y_true, y_pred)`	Returns `{"accuracy", "precision", "recall", "f1"}`
`register(name, fn, kind)`	Add a custom metric at runtime

Standalone metric functions

from coreLearn import accuracy, mae, mse, rmse, precision, recall, f1_score

Dependencies

Package	Purpose
`numpy`	Matrix operations, vectorised arithmetic
`pytest`	Unit testing
`scikit-learn`	Datasets and preprocessing in examples only
`pandas`	Data loading in examples only
`matplotlib`	Visualisation in examples only

Project details

Release history Release notifications | RSS feed

This version

0.1.4

Jun 6, 2026

0.1.3

Jun 6, 2026

0.1.2

May 25, 2026

0.1.1

May 25, 2026

0.1.0

May 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corelearn-0.1.4.tar.gz (20.9 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

corelearn-0.1.4-py3-none-any.whl (20.0 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file corelearn-0.1.4.tar.gz.

File metadata

Download URL: corelearn-0.1.4.tar.gz
Upload date: Jun 6, 2026
Size: 20.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for corelearn-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`2e5d72a816e903b65b8e1ff053e38e509f9ed951b3dd9768c6e900338325baaa`
MD5	`08e79a471ddaf01796aab4523b9bae14`
BLAKE2b-256	`35c74da157353821f0cd17b3d77c719056ccb0fdd32f0bcfa4e4fc314499952d`

See more details on using hashes here.

File details

Details for the file corelearn-0.1.4-py3-none-any.whl.

File metadata

Download URL: corelearn-0.1.4-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 20.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for corelearn-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f27f801afa07242f9bc0bdbe6cc614cca46b2a06cbc51e7f6dfba61784fd19b8`
MD5	`53b705d872ff974901c4c33a16ee4897`
BLAKE2b-256	`a681627af7cee2d1d774ea900dd7d50ff82f9168862b5cfc50e47cc8c19db336`

See more details on using hashes here.

coreLearn 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

CoreLearn

Installation

Quick Start

Package Structure

Running Tests

Learning Outcomes

1 — Object-Oriented Programming (OOP)

Abstract Base Class & Inheritance

Polymorphism

Encapsulation

2 — Functional Programming

Functions as First-Class Objects

Higher-Order Function — register()

Pure Functions

3 — Concurrency

4 — Recursion

_build — Recursive Tree Construction

_search — Recursive Nearest-Neighbour Search

5 — SOLID Principles

S — Single Responsibility

O — Open/Closed

L — Liskov Substitution

I — Interface Segregation

D — Dependency Inversion

6 — Architectural & Design Patterns

Pattern 1 — Template Method (base.py)

Pattern 2 — Strategy (linear_regression.py)

Pattern 3 — Factory (distances.py)

API Reference

KNNClassifier

LinearRegression

Evaluator

Standalone metric functions

Dependencies

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Higher-Order Function — `register()`

`_build` — Recursive Tree Construction

`_search` — Recursive Nearest-Neighbour Search

Pattern 1 — Template Method (`base.py`)

Pattern 2 — Strategy (`linear_regression.py`)

Pattern 3 — Factory (`distances.py`)

`KNNClassifier`

`LinearRegression`

`Evaluator`