Skip to main content

Data Imputer API

Project description

Data Imputer API in Python

forthebadge made-with-python
Generic badge

Check out the Wiki here.

'imputerApi' Documentation.

Currently Supported Strategies:

  • Mean
  • Median
  • Most-Frequent
  • Constant
  • K Nearest Neighbors

Usage:

Read from csv file:

from imputerApi import ImputerApi
# Create instance of class
imm_api = ImputerApi(path_to_file="data.csv",strategy='mean', headers=True)
# Print data in console
imm_api.print_table(imm_api.data)
# Transform data by replacing missing values with mean
# and selecting only columns Age and Salary with indexes 1 and 2
replaced_data = imm_api.transform(column_indexes=[1, 2])
# Print repalced data in console
imm_api.print_table(replaced_data)
# Write new data to csv file
imm_api.dump_data_to_csv('datanew_mean.csv', replaced_data,use_header_from_data=True, override=True)

Read from a Two Dimensional Matrix (Python List):

from imputerApi import ImputerApi
matrix_2d = [
    ['Country', 'Age', 'Salary', 'Purchased'],
    ['France', 44, 72000, 'No'],
    ['Spain', 27, 48000, 'Yes'],
    ['Germany', 30, 54000, 'No'],
    ['Spain', 38, 61000, 'No'],
    ['Germany', 40, '', 'Yes'],
    ['France', 35, 58000, 'Yes'],
    ['Spain', '', 52000, 'No'],
    ['France', 48, 79000, 'Yes'],
    ['Germany', 50, 83000, 'No'],
    ['France', 37, 67000, 'Yes']
]
# Create instance of class
imm_api = ImputerApi(matrix_2D=matrix_2d, strategy='median', headers=True)
# Print data in console
imm_api.print_table(imm_api.data)
# Transform data by replacing missing values with median
# and selecting only columns Age and Salary
replaced_data = imm_api.transform(columns_by_header_name=["Age","Salary"])
# Print repalced data in console
imm_api.print_table(replaced_data)
# Write new data to csv file
imm_api.dump_data_to_csv('datanew_median.csv', replaced_data,use_header_from_data=True,override=True)
# Create instance with strategy most-frequent
imm_api_most_freq = ImputerApi(path_to_file='datanew_median.csv',strategy="most-frequent",headers=True)
imm_api_most_freq.print_table(imm_api_most_freq.data)
# Transform data by replacing missing values with most-frequent
# and selecting only column Purchased
replaced_data = imm_api_most_freq.transform(columns_by_header_name=["Purchased"])
imm_api_most_freq.print_table(replaced_data)
# Write new table to csv file
imm_api_most_freq.dump_data_to_csv('datanew_most_frequent.csv', replaced_data,
                         use_header_from_data=True, override=True)

Integrating with pandas,numpy:

from imputerApi import ImputerApi
import numpy as np
import pandas as pd
# Read csv data as Pandas DataFrame
df = pd.read_csv('data.csv')
# Convert Pandas Dataframe to Numpy Array
arr = df.values
# Convert Numpy Array to Python List 
arr_list = arr.tolist()
# Pass List to ImputerApi in parameter matrix_2D ; headers = False since it is 2D array
imputer_api = ImputerApi(matrix_2D=arr_list,strategy="mean",headers=False)
# Replacing missing value 'np.nan' with mean
replaced_data = imputer_api.transform(column_indexes=[1,2],missing_value=np.nan)
# Print to console
imputer_api.print_table(arr_2D=replaced_data)
# Write data to CSV file2
imputer_api.dump_data_to_csv("data2.csv",replaced_data,override=True)

Using K-Nearest Neighbors

# Loading Data
imputer_api= ImputerApi("data.csv",strategy="knn",headers=True)
# Imputing Purchased Column containing Text Categorical Values 
# using knn technique and distance method 'Levenshtein'
replaced_data = imputer_api.transform(columns_by_header_name=["Purchased"],missing_value="",knn_method="levenshtein",knn_selection="most-frequent")
# Creating new instance of ImputerApi using replaced_data
imputer_api2 = ImputerApi(matrix_2D=replaced_data,strategy="knn",headers=False)
# Imputing colums 1 and 2 using knn and distance method 'Eucilidian'
replaced_data = imputer_api2.transform(column_indexes=[1,2],missing_value="",knn_method="Euclidian",knn_selection="median")
# Writing replaced data to file
imputer_api.dump_data_to_csv("data2.csv",replaced_data,override=True,use_header_from_data=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ImputerApi-0.0.3.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ImputerApi-0.0.3-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file ImputerApi-0.0.3.tar.gz.

File metadata

  • Download URL: ImputerApi-0.0.3.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.20.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for ImputerApi-0.0.3.tar.gz
Algorithm Hash digest
SHA256 3d751f36114615cc7c407ccef03eaeb36cdc96c893aaf59148ee8c003995b6b9
MD5 f67f78f41c2fbe2dfc68531b16a86fe8
BLAKE2b-256 20c799973278bc6d8ee424322ea0e93c846b9362dd82aa7d4a1990e4aea8ea0c

See more details on using hashes here.

File details

Details for the file ImputerApi-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: ImputerApi-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.20.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.4

File hashes

Hashes for ImputerApi-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 5aa0ba263a671b260d64bfd5d78e6ff959bc6fd1b7ccc14a04bff43a2721d443
MD5 34f79c115384524c112060ab5d1a1eb9
BLAKE2b-256 b1dc10c845f8134a29e17f369a5d2e1d6aa1975898fa4a47a20d3b3d569a50c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page