A modular, object-oriented framework for machine learning and data preprocessing
Project description
Machine Learning & Data Preprocessing Library
Introduction
This is a simple machine learning algorithm library consists of Linear Regression , KNN Classifier and some other data processing algorithms from scratch based on numpy and pandas libraries.
Mapping Core Learning Outcomes
The 6 required patterns were applied appropriately in the project. Every single one is explained below.
1. Object-Oriented Programming (OOP)
- Where: In
core.pyanddata.py. - How:
- Inheritance & Abstraction: Employs abstract base classes (
BaseAlgorithm,RegressionStrategy,DistanceMetric,DataLoader,DataCreator,ImputeStrategy,EncodingStrategy) to enforce blueprints. - Polymorphism: Concrete implementations dynamically substitute base behavior. For example,
LinearRegressionexecutes.train()polymorphic actions via different assigned regression strategies without altering its own structure. - Encapsulation: State variables are protected internally. In
data.py, the raw dataframe is hidden behind a protected attributeself._dataand managed safely using the@propertygetter.
- Inheritance & Abstraction: Employs abstract base classes (
2. Functional Programming
- Where: In
core.pyandutils.py. - How:
- Pure Functions & Lambda:
evaluate_modelavoids modifying external states and relies entirely on input arguments, calculating mean squared errors via a clean pure lambda routine. - Higher-Order Functions & Map/Reduce:
reducecombined withlambdais used insideevaluate_modelto sum squared errors.mapis used insideseries_to_ndarrayto cast panda series rows to float representations.apply_pipelineutilizesreduceto sequentially compose list-based transformation callables across data boundaries (reduce(lambda d, func: func(d), transformations, data)).
- Pure Functions & Lambda:
3. Concurrency (Multi-threading)
- Where: Implemented in
core.pyinside theKNNClassifierclass. - How:
- Predicting classes for massive feature maps sequentially is computationally bound. The
predictmethod generates individualthreading.Threadoperations for every distinct evaluation sample. - The
_predict_singleworker calculates specific row-by-row matrix operations concurrently, storing structural outputs inside a shared pre-allocated numpy results matrix (results[index]). - Thread control structures utilize
t.start()loops followed by systematict.join()barriers to synchronize and block primary execution until parallel estimations conclude safely.
- Predicting classes for massive feature maps sequentially is computationally bound. The
4. Recursion / Dynamic Programming
- Where: In
core.pyinside theEuclideanDistanceclass. - How:
- Distance metrics typically resolve dimensions via nested iterative syntax or high-level library functions. This implementation achieves element-wise vector difference accumulations via a custom recursive function
recursive_sum_sq(a, b, idx). - It recursively accumulates squared parameter differences index-by-index until it reaches the base case (
idx == len(a)), gracefully returning the final structural matrix sqrt reduction.
- Distance metrics typically resolve dimensions via nested iterative syntax or high-level library functions. This implementation achieves element-wise vector difference accumulations via a custom recursive function
5. SOLID Principles
- Where: In
core.pyanddata.py. - How:
- Single Responsibility Principle (SRP): Classes do exactly one thing.
CSVLoaderonly ingests data streams;MeanImputerstrictly provides missing value fillings;DataProcessorfocuses on data manipulation. - Open/Closed Principle (OCP): The system is open for extension but closed for modification. Introducing a new distance metric (e.g., Cosine Distance) requires subclassing
DistanceMetricwithout touchingKNNClassifier. - Liskov Substitution Principle (LSP): Derived classes are completely interchangeable with their abstractions. Any encoder (
LabelEncoder,OneHotEncoder,TargetEncoder) fulfills the signature constraints expected byDataProcessor. - Interface Segregation Principle (ISP): Interfaces remain lean and decoupled.
RegressionStrategyenforces a single clear contractual point (train), avoiding bulky, unrelated structural configurations. - Dependency Inversion Principle (DIP): High-level objects depend on abstractions rather than low-level concrete modules.
LinearRegressionbinds entirely against theRegressionStrategyinterface, decoupling model training mechanisms from specific analytical algorithms.
- Single Responsibility Principle (SRP): Classes do exactly one thing.
6. Architectural & Design Patterns
- Where: Full design of
data.pyandcore.py. - How:
- Pipeline Architecture: Managed by
DataPipelinewhich neatly bridges file checking, concrete factory creation, loading, and structured feature preparation routines into a uniform linear API stream (run_default_preprocessing). - Strategy Pattern: Implemented multiple times to provide interchangeable components:
- Optimization algorithms in
LinearRegressionviaLeastSquaresStrategyandGradientDescentStrategy. - Distance formulations in
KNNClassifierviaEuclideanDistanceandManhattanDistance. - Data imputation in
DataProcessorviaMeanImputer,MedianImputer, andModeImputer. - Variable transformations via
LabelEncoder,OneHotEncoder, andTargetEncoder.
- Optimization algorithms in
- Factory Method Pattern: Used to create appropriate data loaders without binding to concrete files.
DataCreatoracts as the creator interface, declaringcreate_document(). Concrete implementationsCSVCreatorandJSONCreatoroverride this method to instantiate and returnCSVLoaderorJSONLoaderrespectively, abstracting the instantiation process away from the main pipeline.
- Pipeline Architecture: Managed by
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file my_python_lib_tarik-0.1.0.tar.gz.
File metadata
- Download URL: my_python_lib_tarik-0.1.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b43a248a77ef1438d4f83345a642d09f4243237186167b9a64bb78653853291
|
|
| MD5 |
8817b0f8220178bd4dda61b9ce0a2f2a
|
|
| BLAKE2b-256 |
7f7300693470a80f8a89b6415d9921ace406e82b34c0a1fa5e3e91540852c1c7
|
File details
Details for the file my_python_lib_tarik-0.1.0-py3-none-any.whl.
File metadata
- Download URL: my_python_lib_tarik-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b6cc21118d4a2d060fc548e18d694a6ca6a1eac28f8d51428fb8a8aabb72bb3
|
|
| MD5 |
c34abc6adeceb33c0bf4e6fe525832cd
|
|
| BLAKE2b-256 |
f31feb9def8b00ccdb9564f410ca261e871b412cd98957438bc128f7c0813f96
|