Skip to main content

Fill-in missing values using data mean and correlation

Project description

Linear Data Imputation

This allows you to fill-in missing values from your data. It uses a distribution estimated from the mean and covariance of your data.

Installation

pip install linear-imputation

How to use

You might have some data which is missing some values:

>>> import pandas as pd
>>> import numpy as np
>>> from linear_imputation import impute, Imputer
>>> 
>>> input_data = pd.DataFrame({'age': [10,20,30], 'pets':[100,200,None]})
>>> input_data                                                                                        
   age   pets
0   10  100.0
1   20  200.0
2   30    NaN

To fill-in the missing values of your data, you only have to call the impute function:

>>> impute(input_data) 
    age   pets
0  10.0  100.0
1  20.0  200.0
2  30.0  187.5

The filled-in values are considered the most likely, given the distribution of your data.

Sometimes it is useful to build a model from some training data to later apply this model to some other data. Building a model is easy:

>>> model = Imputer(input_data)

You can then use it to fill-in missing values of other data you have:

>>> marty = {'name': "Marty", 'age': None, 'pets': 150} 
>>> model.impute(marty) 
{'name': 'Marty', 'age': 20.0, 'pets': 150}

The data to be completed can also be a pandas.DataFrame

>>> df = pd.DataFrame([marty, {'name': 'Tom', 'age': 35}]) 
>>> model.impute(df) 
    name   age    pets
0  Marty  20.0  150.00
1    Tom  35.0  206.25

You can also use a numpy.ndarray

>>> matrix = np.array([[10,100], [20, 200], [30, None]]) 
>>> impute(matrix)
array([[10, 100],
       [20, 200],
       [30, 187.5]])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linear-imputation-1.0.1.tar.gz (3.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page