It serve to fill NaN value in numpy array or in pandas dataframe using various prediction algorithmes.
Project description
CleanFill
ClearFill is a python library you can use to fill NaN value in a matrix using various predictions techniques. This is useful in the context of collaborative filtering. It can be used to predict items rating in the context of recommendation engine. This code will fill NaN (Not A Number) with some predicted value according to the desired method of prediction.
Alot of time, part of the data is left unfilled. This is frustating as sometime you are force to dump alot of data because of missing values. But if you use a library like CleanFill, you can avoid having to remove this potentially useful data.
It's a simple data transformation tool.
How it works
ClearFill take in a numpy array matrix containing NaN and fill them with estimated value. For a demonstration simply look at test.py
Available prediction methode for filling data
- Linear regression
- Nearest value
- Slope One (Fastest)
- Weighted Slope One
- Bipolar Slope One
- Means
Available filling tools
- NaNtoZero
- ZeroToNaN
Installation
pip install CleanFill
Depedencies
You'll need numpy, scipy and pandas installed in your venv to run this library.
Exemple for NaN as value with numpy array
import numpy as np
from cleanfill import fill
nan = np.NaN
my_data = np.array([[7, nan, 8, 7],
[6, 5, nan, 2],
[nan, 2, 2, 5],
[1, 3, 4, 1],
[2, nan, 2, 1]])
print(fill.linear(my_data))
print(fill.nearest(my_data))
print(fill.slope_one(my_data))
print(fill.weighted_slope_one(my_data))
print(fill.bipolar_slope_one(my_data))
print(fill.means(my_data)
Exemple for 0 as value
import numpy as np
from cleanfill import fill
my_data2 = np.array([[7, 0, 8, 7],
[6, 5, 0, 2],
[0, 2, 2, 5],
[1, 3, 4, 1],
[2, 0, 2, 1]])
my_data2 = CleanFill.ZeroToNaN(my_data2)
print(fill.linear(my_data2))
print(fill.nearest(my_data))
print(fill.slope_one(my_data2))
print(fill.weighted_slope_one(my_data2))
print(fill.bipolar_slope_one(my_data2))
print(fill.means(my_data2))
Exemple for NaN as value with pandas dataframe
import numpy as np
import pandas as pd
from cleanfill import fill
d={'name': ['hello', 'mello', 'yellow', 'pink'],
'number': [6., 4., np.nan, 8.],
'number2': [7., np.nan, 9., 9.],
'number3': [np.nan, 5., 9., 10.],
'number4': [8., np.nan, 7., 5.],
'number5': [8., 6., np.nan, 5.],
'number6': [3., 6., 9., np.nan],
'number7': [np.nan, 2., 10., 1.],
'number7': [2., 10., np.nan, 3.],
'number7': [1., 2., 3., np.nan],
'number7': [8., np.nan, 9., 9.]
}
df=pd.DataFrame(data=d)
print(fill.linear(df))
print(fill.nearest(df))
print(fill.slope_one(df))
print(fill.weighted_slope_one(df))
print(fill.bipolar_slope_one(df))
print(fill.means(df))
Exemple for 0 as value with pandas dataframe
import numpy as np
import pandas as pd
from cleanfill import fill
d={'name': ['hello', 'mello', 'yellow', 'pink'],
'number': [6., 4., 0, 8.],
'number2': [7., 0, 9., 9.],
'number3': [0, 5., 9., 10.],
'number4': [8., 0, 7., 5.],
'number5': [8., 6., 0, 5.],
'number6': [3., 6., 9., 0],
'number7': [0, 2., 10., 1.],
'number7': [2., 10., 0, 3.],
'number7': [1., 2., 3., 0],
'number7': [8., 0, 9., 9.]
}
df=pd.DataFrame(data=d)
df=fill.ZeroToNaN(df)
print(fill.linear(df))
print(fill.nearest(df))
print(fill.slope_one(df))
print(fill.weighted_slope_one(df))
print(fill.bipolar_slope_one(df))
print(fill.means(df))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cleanfill-0.1.8.tar.gz
.
File metadata
- Download URL: cleanfill-0.1.8.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44861e1a2db9d086d00c13b3aeb22292deeaf7b302b2888194f2e7cfab750310 |
|
MD5 | 014e17f692aa7f8e65c56c9e6ae62a29 |
|
BLAKE2b-256 | fe25130e79a54e5818f12cbcb8c13e0ca054c698d6f320c5eefa851452fdf726 |
File details
Details for the file cleanfill-0.1.8-py3-none-any.whl
.
File metadata
- Download URL: cleanfill-0.1.8-py3-none-any.whl
- Upload date:
- Size: 5.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cdc620f1f6e96de94c7901f7e2cf8bc58e977c670d1bfb4e1324fe9b3e7b6d8 |
|
MD5 | 58e7abc9cf2f2b2bd0922ce35f1d1605 |
|
BLAKE2b-256 | b4626a09db65f3799c686e7e2d727a0c9812a9494f88e257c1eda6c14483ee64 |