Skip to main content

Python gap filling toolkit

Project description



nona: Python gap filling toolkit

What is it?

nona a simple toolkit for filling gaps in a dataset. Filling in the gap using artificial intelligence methods.

We go through all the columns. We find a column with gaps and split the dataset into a train, this is the part in which we know all the values ​​​​and the test, where there is no missing value in the column and predict using any machine learning method that supports a simple implementation of fit and predict.

Main Features

Here are just a few of the things that nona does well:

  • Easy and fast filling of missing values.
  • Using Machine Learning Methods
  • Support for machine learning methods with the base implementation of fit and predict
  • High Prediction Accuracy of Missing Values

Where to get it

The source code is currently hosted on GitHub at: GitHub - AbdualimovTP/nona: library for filling in missing values ​​using artificial intelligence methods Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install nona

Dependencies

Quick start

Out of the box, use ridge regression to fill in the gaps with the regression problem, and RandomForestClassifier for the classification problem in columns with missing values.

# load library
from nona.nona import nona


# prepare your data with na to ML
# only numerical values ​​in the dataset


# fill the missing values
nona(YOUR_DATA)

Accuracy improvement

You can insert other machine learning methods into the function. They should support a simple implementation of fit and predict.

Parameters:

  • data: prepared dataset

  • algreg: Regression algorithm to predict missing values ​​in columns

  • algclss: Classification algorithm to predict missing values ​​in columns

# load library
from nona.nona import nona


# prepare your data with na to ML
# only numerical values ​​in the dataset


# fill the missing values
nona(data=YOUR_DATA, algreg=make_pipeline(StandardScaler(with_mean=False), Ridge(alpha=0.1)), algclass=RandomForestClassifier(max_depth=2, random_state=0))

Comparison of accuracy with other gap filling methods

Framingham heart study dataset | Kaggle

Results of RMSE techniques for filling gaps depending on the percentage of missing values ​​in the dataset.

10% 20% 30% 40% 50% 70% 90%
Baseline_MEAN 2.67 3.8 4.7 5.66 6.4 7.4 8.43
KNN 2.48 3.7 4.57 5.55 6.35 7.47 8.49
MICE 2.12 3.17 4.59 5.41 5.94 7.33 8.61
MISSFOREST 2.26 3.36 4.31 5.33 6.15 8.06 9.85
NONA 2.24 3.35 4.28 5.16 5.83 7.12 8.43

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nona-0.0.2.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nona-0.0.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file nona-0.0.2.tar.gz.

File metadata

  • Download URL: nona-0.0.2.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for nona-0.0.2.tar.gz
Algorithm Hash digest
SHA256 9acec2427fc30ff783bed601dd1b37dcecc2628a6588b72be9f4227bce671684
MD5 000f89711ef406ca308dcc622de4a5c3
BLAKE2b-256 a0bcede5be3a38034511afc992becceb1805e83b89ae418662471d62f2e668cc

See more details on using hashes here.

File details

Details for the file nona-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: nona-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for nona-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 76feb4d1b185ebf8d63f4772a07042fd6e08691cbeaf65d817de13b0e5c28a11
MD5 e8f74c47465cc821e4b6b1baf5c776c7
BLAKE2b-256 deda1562195f460da50e7a6fedc26eea31e0376d3b6e9b5c7a0e9149b3627791

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page