A Python toolkit for preprocessing datasets.
Project description
PrepDataKit
PrepDataKit is a Python package that provides a toolkit for preprocessing datasets. It offers various functions to assist in reading data from different file formats, summarizing datasets, handling missing values, and encoding categorical data.
Installation
You can install PrepDataKit using pip:
pip install prepdatakit
Sample Data
Category | Price | In Stock | Description |
---|---|---|---|
Fruit | 2.50 | True | Ripe and delicious |
Animal | None | False | Needs more data |
Color | 1.99 | Vivid and bright | |
Tool | 9.99 | True | Heavy duty and reliable (Maybe) |
Usage
Here's an example of how to use PrepDataKit:
from prepdatakit import prepdatakit
import time
# Read a CSV file
data = prepdatakit.read_file('data.csv')
print("Start after loading the file, summary")
# Get summary statistics
summary = prepdatakit.get_summary(data)
print(summary)
print("Finish summary")
time.sleep(0.5)
# Handle missing values
print("Start clean_data")
clean_data = prepdatakit.handle_missing_values(data, strategy='remove')
print(clean_data)
print("Finish clean_data")
time.sleep(0.5)
# Encode categorical data
print("Start encoded_data")
encoded_data = prepdatakit.one_hot_encode(clean_data, columns=['category'])
print("End encoded_data")
time.sleep(0.5)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
prepdatakit-1.5.4.tar.gz
(3.6 kB
view hashes)
Built Distribution
Close
Hashes for prepdatakit-1.5.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7e53da93750733587ab3cdcae823eaa8bd3d9f71528a3b1da1730f5c156bed0 |
|
MD5 | fadedafcb54cf4634f239321560a085c |
|
BLAKE2b-256 | 5467e3a89fed58b932422c96b32af31f27817b5432e9698357537fbfc47fb315 |