Skip to main content

Python gap filling toolkit

Project description



nona: Python gap filling toolkit

What is it?

nona a simple toolkit for filling gaps in a dataset. Filling in the gap using artificial intelligence methods.

We go through all the columns. We find a column with gaps and split the dataset into a train, this is the part in which we know all the values ​​​​and the test, where there is no missing value in the column and predict using any machine learning method that supports a simple implementation of fit and predict.

Main Features

Here are just a few of the things that nona does well:

  • Easy and fast filling of missing values.
  • Using Machine Learning Methods
  • Support for machine learning methods with the base implementation of fit and predict
  • High Prediction Accuracy of Missing Values

Where to get it

The source code is currently hosted on GitHub at: GitHub - AbdualimovTP/nona: library for filling in missing values ​​using artificial intelligence methods Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install nona

Dependencies

Quick start

Out of the box, use ridge regression to fill in the gaps with the regression problem, and RandomForestClassifier for the classification problem in columns with missing values.

# load library
from nona.nona import nona


# prepare your data with na to ML
# only numerical values ​​in the dataset


# fill the missing values
nona(YOUR_DATA)

Accuracy improvement

You can insert other machine learning methods into the function. They should support a simple implementation of fit and predict.

Parameters:

  • data: prepared dataset

  • algreg: Regression algorithm to predict missing values ​​in columns

  • algclss: Classification algorithm to predict missing values ​​in columns

# load library
from nona.nona import nona


# prepare your data with na to ML
# only numerical values ​​in the dataset


# fill the missing values
nona(data=YOUR_DATA, algreg=make_pipeline(StandardScaler(with_mean=False), Ridge(alpha=0.1)), algclass=RandomForestClassifier(max_depth=2, random_state=0))

Comparison of accuracy with other gap filling methods

Framingham heart study dataset | Kaggle

Results of RMSE techniques for filling gaps depending on the percentage of missing values ​​in the dataset.

10% 20% 30% 40% 50% 70% 90%
Baseline_MEAN 2.67 3.8 4.7 5.66 6.4 7.4 8.43
KNN 2.48 3.7 4.57 5.55 6.35 7.47 8.49
MICE 2.12 3.17 4.59 5.41 5.94 7.33 8.61
MISSFOREST 2.26 3.36 4.31 5.33 6.15 8.06 9.85
NONA 2.24 3.35 4.28 5.16 5.83 7.12 8.43

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nona-0.0.2.tar.gz (7.8 kB view hashes)

Uploaded Source

Built Distribution

nona-0.0.2-py3-none-any.whl (8.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page