Skip to main content

Data Imputer API

Project description

Data Imputer API in Python

Check out the Wiki here.

'imputerApi' Documentation.

Currently Supported Strategies:

  • Mean
  • Median
  • Most-Frequent
  • Constant

Usage:

Read from csv file:

from imputerApi import ImputerApi
# Create instance of class
imm_api = ImputerApi(path_to_file="data.csv",strategy='mean', headers=True)
# Print data in console
imm_api.print_table(imm_api.data)
# Transform data by replacing missing values with mean
# and selecting only columns Age and Salary with indexes 1 and 2
replaced_data = imm_api.transform(column_indexes=[1, 2])
# Print repalced data in console
imm_api.print_table(replaced_data)
# Write new data to csv file
imm_api.dump_data_to_csv('datanew_mean.csv', replaced_data,use_header_from_data=True, override=True)

Read from a Two Dimensional Matrix (Python List):

from imputerApi import ImputerApi
matrix_2d = [
    ['Country', 'Age', 'Salary', 'Purchased'],
    ['France', 44, 72000, 'No'],
    ['Spain', 27, 48000, 'Yes'],
    ['Germany', 30, 54000, 'No'],
    ['Spain', 38, 61000, 'No'],
    ['Germany', 40, '', 'Yes'],
    ['France', 35, 58000, 'Yes'],
    ['Spain', '', 52000, 'No'],
    ['France', 48, 79000, 'Yes'],
    ['Germany', 50, 83000, 'No'],
    ['France', 37, 67000, 'Yes']
]
# Create instance of class
imm_api = ImputerApi(matrix_2D=matrix_2d, strategy='median', headers=True)
# Print data in console
imm_api.print_table(imm_api.data)
# Transform data by replacing missing values with median
# and selecting only columns Age and Salary
replaced_data = imm_api.transform(columns_by_header_name=["Age","Salary"])
# Print repalced data in console
imm_api.print_table(replaced_data)
# Write new data to csv file
imm_api.dump_data_to_csv('datanew_median.csv', replaced_data,use_header_from_data=True,override=True)
# Create instance with strategy most-frequent
imm_api_most_freq = ImputerApi(path_to_file='datanew_median.csv',strategy="most-frequent",headers=True)
imm_api_most_freq.print_table(imm_api_most_freq.data)
# Transform data by replacing missing values with most-frequent
# and selecting only column Purchased
replaced_data = imm_api_most_freq.transform(columns_by_header_name=["Purchased"])
imm_api_most_freq.print_table(replaced_data)
# Write new table to csv file
imm_api_most_freq.dump_data_to_csv('datanew_most_frequent.csv', replaced_data,
                         use_header_from_data=True, override=True)

Integrating with pandas,numpy:

from imputerApi import ImputerApi
import numpy as np
import pandas as pd
# Read csv data as Pandas DataFrame
df = pd.read_csv('data.csv')
# Convert Pandas Dataframe to Numpy Array
arr = df.values
# Convert Numpy Array to Python List 
arr_list = arr.tolist()
# Pass List to ImputerApi in parameter matrix_2D ; headers = False since it is 2D array
imputer_api = ImputerApi(matrix_2D=arr_list,strategy="mean",headers=False)
# Replacing missing value 'np.nan' with mean
replaced_data = imputer_api.transform(column_indexes=[1,2],missing_value=np.nan)
# Print to console
imputer_api.print_table(arr_2D=replaced_data)
# Write data to CSV file2
imputer_api.dump_data_to_csv("data2.csv",replaced_data,override=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ImputerApi-0.0.2.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ImputerApi-0.0.2-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file ImputerApi-0.0.2.tar.gz.

File metadata

  • Download URL: ImputerApi-0.0.2.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.20.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for ImputerApi-0.0.2.tar.gz
Algorithm Hash digest
SHA256 46a72ccad350b20330dcfc74980b167e11e59a60b97a0e42ea431c2b6c944d9e
MD5 116559aff546300c158857c496b37bc0
BLAKE2b-256 7310115126c459a3b2ae4a8ec82810b697ba021e906758665893203734e75246

See more details on using hashes here.

File details

Details for the file ImputerApi-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ImputerApi-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.20.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for ImputerApi-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e4a7ffd9a4593ea8ff95e3ee563a0522d1008b380eda4735a66d2715649db4ee
MD5 575790e3449666865a56926a3d0cb2e6
BLAKE2b-256 1dd659ca9fd4b2687e5f002deabe591e484db2c9666254d3aba7f1a8ee92494d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page