A comprehensive toolkit for preprocessing datasets, including data reading, data summary generation, handling missing values, and categorical data encoding.
Project description
PrepDataKit
PrepDataKit is a Python package that provides a toolkit for preprocessing datasets. It offers various functions to assist in reading data from different file formats, summarizing datasets, handling missing values, and encoding categorical data.
Installation
You can install PrepDataKit using pip:
pip install prepdatakit
Sample Data
Category | Price | In Stock | Description |
---|---|---|---|
Fruit | 2.50 | True | Ripe and delicious |
Animal | None | False | Needs more data |
Color | 1.99 | Vivid and bright | |
Tool | 9.99 | True | Heavy duty and reliable (Maybe) |
Usage
Here's an example of how to use PrepDataKit:
from prepdatakit import prepdatakit
import time
if __name__ == "__main__":
data = prepdatakit.read_file("reviews.csv")
# Reading the file
print("Data Information:")
print(prepdatakit.tabulate(data.head(), headers="keys", tablefmt="fancy_grid"))
print("\nData Type:", type(data))
print("Data Shape:", data.shape)
print("=" * 50)
# Generating summary
summary = prepdatakit.get_summary(data)
print("\nSummary Statistics:")
for key, value in summary.items():
print(key + ":")
if isinstance(value, prepdatakit.pd.DataFrame):
print(prepdatakit.tabulate(value, headers="keys", tablefmt="fancy_grid"))
elif isinstance(value, dict):
for k, v in value.items():
print(f" {k}: {v}")
print("-" * 50)
# Handling missing values
clean_data = prepdatakit.handle_missing_values(data, strategy="remove")
print("\nCleaned Data:")
# print(tabulate(clean_data.head(), headers='keys', tablefmt='fancy_grid'))
with open("clean_data.txt", "w", encoding="utf-8") as f:
f.write(prepdatakit.tabulate(clean_data, headers="keys", tablefmt="fancy_grid"))
print("\nData Type:", type(clean_data))
# Encoding categorical data
encoded_data = prepdatakit.one_hot_encode(clean_data)
print("\nEncoded Data:")
with open("encoded_data.txt", "w", encoding="utf-8") as f:
f.write(prepdatakit.tabulate(encoded_data, headers="keys", tablefmt="psql"))
# print(tabulate(encoded_data.head(), headers='keys', tablefmt='plain'))
print("\nData Type:", type(encoded_data))
print("Data Shape:", encoded_data.shape)
print("=" * 50)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
prepdatakit-1.5.8.tar.gz
(3.5 kB
view hashes)
Built Distribution
Close
Hashes for prepdatakit-1.5.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2912bc75742b992419912f76b0c76eacb4e6df78f72253209151950360b2e55 |
|
MD5 | dbdb70dfb6286d306d28e748544389b8 |
|
BLAKE2b-256 | f2c5536c789b31ea6cb4eec6267d015255c96dc1065f830c170623b10e5eb13e |