Skip to main content

Fill-in missing values using data mean and correlation

Project description

Linear Data Imputation

This allows you to fill-in missing values from your data. It uses a distribution estimated from the mean and covariance of your data.

Installation

pip install linear-imputation

How to use

You might have some data which is missing some values:

>>> import pandas as pd
>>> import numpy as np
>>> from linear_imputation import impute, Imputer
>>> 
>>> input_data = pd.DataFrame({'age': [10,20,30], 'pets':[100,200,None]})
>>> input_data                                                                                        
   age   pets
0   10  100.0
1   20  200.0
2   30    NaN

To fill-in the missing values of your data, you only have to call the impute function:

>>> impute(input_data) 
    age   pets
0  10.0  100.0
1  20.0  200.0
2  30.0  187.5

The filled-in values are considered the most likely, given the distribution of your data.

Sometimes it is useful to build a model from some training data to later apply this model to some other data. Building a model is easy:

>>> model = Imputer(input_data)

You can then use it to fill-in missing values of other data you have:

>>> marty = {'name': "Marty", 'age': None, 'pets': 150} 
>>> model.impute(marty) 
{'name': 'Marty', 'age': 20.0, 'pets': 150}

The data to be completed can also be a pandas.DataFrame

>>> df = pd.DataFrame([marty, {'name': 'Tom', 'age': 35}]) 
>>> model.impute(df) 
    name   age    pets
0  Marty  20.0  150.00
1    Tom  35.0  206.25

You can also use a numpy.ndarray

>>> matrix = np.array([[10,100], [20, 200], [30, None]]) 
>>> impute(matrix)
array([[10, 100],
       [20, 200],
       [30, 187.5]])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linear-imputation-1.0.1.tar.gz (3.3 kB view details)

Uploaded Source

File details

Details for the file linear-imputation-1.0.1.tar.gz.

File metadata

  • Download URL: linear-imputation-1.0.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.22.0 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.6.8

File hashes

Hashes for linear-imputation-1.0.1.tar.gz
Algorithm Hash digest
SHA256 af70081de247a862c40fa285f76acb545737d5f9106d37304c93944228a43c6d
MD5 4020315e1f251ea3f29fe90187ff9c5b
BLAKE2b-256 e62f01b3db557cc6412c67ece6c53e9be7aec8db1095821bea29b76270f3b45e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page