A modular, object-oriented framework for machine learning and data preprocessing

These details have not been verified by PyPI

Project description

Machine Learning & Data Preprocessing Library

Introduction

This is a simple machine learning algorithm library consists of Linear Regression , KNN Classifier and some other data processing algorithms from scratch based on numpy and pandas libraries.

Mapping Core Learning Outcomes

The 6 required patterns were applied appropriately in the project. Every single one is explained below.

1. Object-Oriented Programming (OOP)

Where: In core.py and data.py.
How:
- Inheritance & Abstraction: Employs abstract base classes (BaseAlgorithm, RegressionStrategy, DistanceMetric, DataLoader, DataCreator, ImputeStrategy, EncodingStrategy) to enforce blueprints.
- Polymorphism: Concrete implementations dynamically substitute base behavior. For example, LinearRegression executes .train() polymorphic actions via different assigned regression strategies without altering its own structure.
- Encapsulation: State variables are protected internally. In data.py, the raw dataframe is hidden behind a protected attribute self._data and managed safely using the @property getter.

2. Functional Programming

Where: In core.py and utils.py.
How:
- Pure Functions & Lambda: evaluate_model avoids modifying external states and relies entirely on input arguments, calculating mean squared errors via a clean pure lambda routine.
- Higher-Order Functions & Map/Reduce:
  - reduce combined with lambda is used inside evaluate_model to sum squared errors.
  - map is used inside series_to_ndarray to cast panda series rows to float representations.
  - apply_pipeline utilizes reduce to sequentially compose list-based transformation callables across data boundaries (reduce(lambda d, func: func(d), transformations, data)).

3. Concurrency (Multi-threading)

Where: Implemented in core.py inside the KNNClassifier class.
How:
- Predicting classes for massive feature maps sequentially is computationally bound. The predict method generates individual threading.Thread operations for every distinct evaluation sample.
- The _predict_single worker calculates specific row-by-row matrix operations concurrently, storing structural outputs inside a shared pre-allocated numpy results matrix (results[index]).
- Thread control structures utilize t.start() loops followed by systematic t.join() barriers to synchronize and block primary execution until parallel estimations conclude safely.

4. Recursion / Dynamic Programming

Where: In core.py inside the EuclideanDistance class.
How:
- Distance metrics typically resolve dimensions via nested iterative syntax or high-level library functions. This implementation achieves element-wise vector difference accumulations via a custom recursive function recursive_sum_sq(a, b, idx).
- It recursively accumulates squared parameter differences index-by-index until it reaches the base case (idx == len(a)), gracefully returning the final structural matrix sqrt reduction.

5. SOLID Principles

Where: In core.py and data.py.
How:
- Single Responsibility Principle (SRP): Classes do exactly one thing. CSVLoader only ingests data streams; MeanImputer strictly provides missing value fillings; DataProcessor focuses on data manipulation.
- Open/Closed Principle (OCP): The system is open for extension but closed for modification. Introducing a new distance metric (e.g., Cosine Distance) requires subclassing DistanceMetric without touching KNNClassifier.
- Liskov Substitution Principle (LSP): Derived classes are completely interchangeable with their abstractions. Any encoder (LabelEncoder, OneHotEncoder, TargetEncoder) fulfills the signature constraints expected by DataProcessor.
- Interface Segregation Principle (ISP): Interfaces remain lean and decoupled. RegressionStrategy enforces a single clear contractual point (train), avoiding bulky, unrelated structural configurations.
- Dependency Inversion Principle (DIP): High-level objects depend on abstractions rather than low-level concrete modules. LinearRegression binds entirely against the RegressionStrategy interface, decoupling model training mechanisms from specific analytical algorithms.

6. Architectural & Design Patterns

Where: Full design of data.py and core.py.
How:
- Pipeline Architecture: Managed by DataPipeline which neatly bridges file checking, concrete factory creation, loading, and structured feature preparation routines into a uniform linear API stream (run_default_preprocessing).
- Strategy Pattern: Implemented multiple times to provide interchangeable components:
  - Optimization algorithms in LinearRegression via LeastSquaresStrategy and GradientDescentStrategy.
  - Distance formulations in KNNClassifier via EuclideanDistance and ManhattanDistance.
  - Data imputation in DataProcessor via MeanImputer, MedianImputer, and ModeImputer.
  - Variable transformations via LabelEncoder, OneHotEncoder, and TargetEncoder.
- Factory Method Pattern: Used to create appropriate data loaders without binding to concrete files. DataCreator acts as the creator interface, declaring create_document(). Concrete implementations CSVCreator and JSONCreator override this method to instantiate and return CSVLoader or JSONLoader respectively, abstracting the instantiation process away from the main pipeline.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

my_python_lib_tarik-0.1.0.tar.gz (12.1 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

my_python_lib_tarik-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file my_python_lib_tarik-0.1.0.tar.gz.

File metadata

Download URL: my_python_lib_tarik-0.1.0.tar.gz
Upload date: Jun 6, 2026
Size: 12.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for my_python_lib_tarik-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7b43a248a77ef1438d4f83345a642d09f4243237186167b9a64bb78653853291`
MD5	`8817b0f8220178bd4dda61b9ce0a2f2a`
BLAKE2b-256	`7f7300693470a80f8a89b6415d9921ace406e82b34c0a1fa5e3e91540852c1c7`

See more details on using hashes here.

File details

Details for the file my_python_lib_tarik-0.1.0-py3-none-any.whl.

File metadata

Download URL: my_python_lib_tarik-0.1.0-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for my_python_lib_tarik-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4b6cc21118d4a2d060fc548e18d694a6ca6a1eac28f8d51428fb8a8aabb72bb3`
MD5	`c34abc6adeceb33c0bf4e6fe525832cd`
BLAKE2b-256	`f31feb9def8b00ccdb9564f410ca261e871b412cd98957438bc128f7c0813f96`

See more details on using hashes here.

my-python-lib-tarik 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Machine Learning & Data Preprocessing Library

Introduction

Mapping Core Learning Outcomes

1. Object-Oriented Programming (OOP)

2. Functional Programming

3. Concurrency (Multi-threading)

4. Recursion / Dynamic Programming

5. SOLID Principles

6. Architectural & Design Patterns

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes