MLimputer - Null Imputation Framework for Supervised Machine Learning

These details have not been verified by PyPI

Project links

Homepage

Project description

MLimputer - Null Imputation Framework for Supervised Machine Learning

Framework Contextualization

The MLimputer project constitutes an complete and integrated pipeline to automate the handling of missing values in datasets through regression prediction and aims at reducing bias and increase the precision of imputation results when compared to more classic imputation methods. This package provides multiple algorithm options to impute your data (shown bellow), in which every observed data column with existing missing values is fitted with a robust preprocessing approach and subsequently predicted.

The architecture design includes three main sections, these being: missing data analysis, data preprocessing and predictive model imputation which are organized in a customizable pipeline structure.

This project aims at providing the following application capabilities:

General applicability on tabular datasets: The developed imputation procedures are applicable on any data table associated with any Supervised ML scopes, based on missing data columns to be imputed.
Robustness and improvement of predictive results: The application of the MLimputer preprocessing aims at improve the predictive performance through customization and optimization of existing missing values imputation in the dataset input columns.

Main Development Tools

Major frameworks used to built this project:

Where to get it

Binary installer for the latest released version is available at the Python Package Index (PyPI).

The source code is currently hosted on GitHub at: https://github.com/TsLu1s/MLimputer

Installation

To install this package from Pypi repository run the following command:

pip install mlimputer

Usage Examples

The first needed step after importing the package is to load a dataset (split it) and define your choosen imputation model infit_imput function. The imputation model options for handling the missing data in your dataset are the following:

RandomForest
ExtraTrees
GBR
KNN
XGBoost
Lightgbm
Catboost

After fitting your imputation model, you can load the imputer variable into fit_configs parameter in the transform_imput function. From there you can impute the future datasets (validate, test ...) with the same data properties. Note, as it shows in the example bellow, you can also customize your model imputer parameters by changing it's configurations and then, implementing them in the imputer_configs function parameter.

Through the cross_validation function you can also compare the predictive performance evalution of multiple imputations, allowing you to validate which imputation model fits better your future predictions.

Importante Notes:

The actual version of this package does not incorporate the imputing of categorical values, just the automatic handling of numeric missing values is implemented.

import mlimputer as mli
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore", category=Warning) #-> For a clean console

data = pd.read_csv('csv_directory_path') # Dataframe Loading Example

train, test= train_test_split(data, train_size=0.8)
train,test=train.reset_index(drop=True), test.reset_index(drop=True) # <- Required

# All model imputation options ->  "RandomForest","ExtraTrees","GBR","KNN","XGBoost","Lightgbm","Catboost"

# Model Imputer Customization
hparameters=mli.imputer_parameters()

# Customizing parameters settings
hparameters["RandomForest"]["n_estimators"]=40
hparameters["KNN"]["n_neighbors"]=5
print(hparameters)
    
# Imputation Example 1 : RandomForest

imputer_rf=mli.fit_imput(dataset=train,imput_model="RandomForest",imputer_configs=hparameters)
train_rf=mli.transform_imput(dataset=train,fit_configs=imputer_rf)
test_rf=mli.transform_imput(dataset=test,fit_configs=imputer_rf)

# Imputation Example 2 : KNN

imputer_knn=mli.fit_imput(dataset=train,imput_model="KNN",imputer_configs=hparameters)
train_knn=mli.transform_imput(dataset=train,fit_configs=imputer_knn)
test_knn=mli.transform_imput(dataset=test,fit_configs=imputer_knn)
    
#(...)
    
## Performance Evaluation Example - Imputation CrossValidation

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from catboost import CatBoostRegressor
        
leaderboard_knn_imp=mli.cross_validation(dataset=train_knn,
                                         target="Target_Name_Col", 
                                         test_size=0.2,
                                         n_splits=3,
                                         models=[LinearRegression(), RandomForestRegressor(), CatBoostRegressor()])

## Export Imputation Metadata

# KNN Imputation Metadata
import pickle 
output = open("imputer_knn.pkl", 'wb')
pickle.dump(imputer_knn, output)

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Luis Santos - LinkedIn

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.0.26

Feb 10, 2026

2.0.25

Feb 2, 2026

2.0.24

Jan 26, 2026

2.0.23

Jan 19, 2026

2.0.22

Dec 30, 2025

2.0.21

Dec 29, 2025

2.0.20

Dec 29, 2025

1.0.80

Jan 26, 2025

1.0.70

Oct 24, 2024

1.0.68

Jul 31, 2024

1.0.67

May 27, 2024

1.0.66

May 7, 2024

1.0.65

Apr 20, 2024

1.0.56

Apr 1, 2024

1.0.50

Jan 31, 2024

1.0.46

Jan 27, 2024

1.0.40

Jan 2, 2024

1.0.10

Oct 31, 2023

This version

1.0.6

Jul 4, 2023

1.0.5

May 6, 2023

1.0.1

Apr 17, 2023

1.0.0

Mar 12, 2023

0.1.2

Feb 20, 2023

0.0.98

Feb 8, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlimputer-1.0.6-py3-none-any.whl (9.2 kB view details)

Uploaded Jul 4, 2023 Python 3

File details

Details for the file mlimputer-1.0.6-py3-none-any.whl.

File metadata

Download URL: mlimputer-1.0.6-py3-none-any.whl
Upload date: Jul 4, 2023
Size: 9.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for mlimputer-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0be1da805577b5b4cc70cdb7f514a204bdd719c820b9ea756b505966e9787dc0`
MD5	`c655576777d6c01b8f6f34f09f5c2bbe`
BLAKE2b-256	`20f7cf3cf2391a2d3f55d31a62dbb9c876ad0a9e05c5cc57f8426ab7b071aade`

See more details on using hashes here.

mlimputer 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MLimputer - Null Imputation Framework for Supervised Machine Learning

Framework Contextualization

Main Development Tools

Where to get it

Installation

Usage Examples

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes