Skip to main content

It serve to fill NaN value in numpy array or in pandas dataframe using various prediction algorithmes.

Project description

CleanFill

ClearFill is a python library you can use to fill NaN value in a matrix using various predictions techniques. This is useful in the context of collaborative filtering. It can be used to predict items rating in the context of recommendation engine. This code will fill NaN (Not A Number) with some predicted value according to the desired method of prediction.

Alot of time, part of the data is left unfilled. This is frustating as sometime you are force to dump alot of data because of missing values. But if you use a library like CleanFill, you can avoid having to remove this potentially useful data.

It's a simple data transformation tool.

How it works

ClearFill take in a numpy array matrix containing NaN and fill them with estimated value. For a demonstration simply look at test.py

Available prediction methode for filling data

  • Linear regression
  • Nearest value
  • Slope One (Fastest)
  • Weighted Slope One
  • Bipolar Slope One
  • Means

Available filling tools

  • NaNtoZero
  • ZeroToNaN

Installation

pip install CleanFill

Depedencies

You'll need numpy, scipy and pandas installed in your venv to run this library.

Exemple for NaN as value with numpy array

import numpy as np
from cleanfill import fill



nan = np.NaN
my_data = np.array([[7, nan, 8, 7],
                    [6, 5, nan, 2],
                    [nan, 2, 2, 5],
                    [1, 3, 4, 1],
                    [2, nan, 2, 1]])


print(fill.linear(my_data))
print(fill.nearest(my_data))
print(fill.slope_one(my_data))
print(fill.weighted_slope_one(my_data))
print(fill.bipolar_slope_one(my_data))
print(fill.means(my_data)

Exemple for 0 as value

import numpy as np
from cleanfill import fill


my_data2 = np.array([[7, 0, 8, 7],
                    [6, 5, 0, 2],
                    [0, 2, 2, 5],
                    [1, 3, 4, 1],
                    [2, 0, 2, 1]])


my_data2 = CleanFill.ZeroToNaN(my_data2)

print(fill.linear(my_data2))
print(fill.nearest(my_data))
print(fill.slope_one(my_data2))
print(fill.weighted_slope_one(my_data2))
print(fill.bipolar_slope_one(my_data2))
print(fill.means(my_data2))

Exemple for NaN as value with pandas dataframe

import numpy as np
import pandas as pd
from cleanfill import fill

d={'name': ['hello', 'mello', 'yellow', 'pink'],
   'number': [6., 4., np.nan, 8.],
   'number2': [7., np.nan, 9., 9.],
   'number3': [np.nan, 5., 9., 10.],
   'number4': [8., np.nan, 7., 5.],
   'number5': [8., 6., np.nan, 5.],
   'number6': [3., 6., 9., np.nan],
   'number7': [np.nan, 2., 10., 1.],
   'number7': [2., 10., np.nan, 3.],
   'number7': [1., 2., 3., np.nan],
   'number7': [8., np.nan, 9., 9.]
   }

df=pd.DataFrame(data=d)

print(fill.linear(df))
print(fill.nearest(df))
print(fill.slope_one(df))
print(fill.weighted_slope_one(df))
print(fill.bipolar_slope_one(df))
print(fill.means(df))

Exemple for 0 as value with pandas dataframe

import numpy as np
import pandas as pd
from cleanfill import fill

d={'name': ['hello', 'mello', 'yellow', 'pink'],
   'number': [6., 4., 0, 8.],
   'number2': [7., 0, 9., 9.],
   'number3': [0, 5., 9., 10.],
   'number4': [8., 0, 7., 5.],
   'number5': [8., 6., 0, 5.],
   'number6': [3., 6., 9., 0],
   'number7': [0, 2., 10., 1.],
   'number7': [2., 10., 0, 3.],
   'number7': [1., 2., 3., 0],
   'number7': [8., 0, 9., 9.]
   }

df=pd.DataFrame(data=d)

df=fill.ZeroToNaN(df)

print(fill.linear(df))
print(fill.nearest(df))
print(fill.slope_one(df))
print(fill.weighted_slope_one(df))
print(fill.bipolar_slope_one(df))
print(fill.means(df))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanfill-0.1.8.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

cleanfill-0.1.8-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file cleanfill-0.1.8.tar.gz.

File metadata

  • Download URL: cleanfill-0.1.8.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for cleanfill-0.1.8.tar.gz
Algorithm Hash digest
SHA256 44861e1a2db9d086d00c13b3aeb22292deeaf7b302b2888194f2e7cfab750310
MD5 014e17f692aa7f8e65c56c9e6ae62a29
BLAKE2b-256 fe25130e79a54e5818f12cbcb8c13e0ca054c698d6f320c5eefa851452fdf726

See more details on using hashes here.

File details

Details for the file cleanfill-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: cleanfill-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for cleanfill-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 6cdc620f1f6e96de94c7901f7e2cf8bc58e977c670d1bfb4e1324fe9b3e7b6d8
MD5 58e7abc9cf2f2b2bd0922ce35f1d1605
BLAKE2b-256 b4626a09db65f3799c686e7e2d727a0c9812a9494f88e257c1eda6c14483ee64

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page