Skip to main content

A comprehensive toolkit for preprocessing datasets, including data reading, data summary generation, handling missing values, and categorical data encoding.

Project description

PrepDataKit

PrepDataKit is a Python package that provides a toolkit for preprocessing datasets. It offers various functions to assist in reading data from different file formats, summarizing datasets, handling missing values, and encoding categorical data.

Installation

You can install PrepDataKit using pip:

pip install prepdatakit

Sample Data

Category Price In Stock Description
Fruit 2.50 True Ripe and delicious
Animal None False Needs more data
Color 1.99 Vivid and bright
Tool 9.99 True Heavy duty and reliable (Maybe)

Download CSV

Usage

Here's an example of how to use PrepDataKit:

from prepdatakit import prepdatakit
import time
        
if __name__ == "__main__":
    
    data = prepdatakit.read_file("reviews.csv")

    # Reading the file
    print("Data Information:")
    print(prepdatakit.tabulate(data.head(), headers="keys", tablefmt="fancy_grid"))
    print("\nData Type:", type(data))
    print("Data Shape:", data.shape)
    print("=" * 50)

    # Generating summary
    summary = prepdatakit.get_summary(data)
    print("\nSummary Statistics:")
    for key, value in summary.items():
        print(key + ":")
        if isinstance(value, prepdatakit.pd.DataFrame):
            print(prepdatakit.tabulate(value, headers="keys", tablefmt="fancy_grid"))
        elif isinstance(value, dict):
            for k, v in value.items():
                print(f"  {k}: {v}")
        print("-" * 50)

    # Handling missing values
    clean_data = prepdatakit.handle_missing_values(data, strategy="remove")
    print("\nCleaned Data:")
    # print(tabulate(clean_data.head(), headers='keys', tablefmt='fancy_grid'))
    with open("clean_data.txt", "w", encoding="utf-8") as f:
        f.write(prepdatakit.tabulate(clean_data, headers="keys", tablefmt="fancy_grid"))
    print("\nData Type:", type(clean_data))

    # Encoding categorical data
    encoded_data = prepdatakit.one_hot_encode(clean_data)
    print("\nEncoded Data:")
    with open("encoded_data.txt", "w", encoding="utf-8") as f:
        f.write(prepdatakit.tabulate(encoded_data, headers="keys", tablefmt="psql"))
    # print(tabulate(encoded_data.head(), headers='keys', tablefmt='plain'))
    print("\nData Type:", type(encoded_data))
    print("Data Shape:", encoded_data.shape)
    print("=" * 50)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prepdatakit-1.5.8.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

prepdatakit-1.5.8-py3-none-any.whl (3.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page