A comprehensive toolkit for preprocessing datasets, including data reading, data summary generation, handling missing values, and categorical data encoding.
Project description
PrepDataKit
PrepDataKit is a Python package that provides a toolkit for preprocessing datasets. It offers various functions to assist in reading data from different file formats, summarizing datasets, handling missing values, and encoding categorical data.
Installation
You can install PrepDataKit using pip:
pip install prepdatakit
Sample Data
| Category | Price | In Stock | Description |
|---|---|---|---|
| Fruit | 2.50 | True | Ripe and delicious |
| Animal | None | False | Needs more data |
| Color | 1.99 | Vivid and bright | |
| Tool | 9.99 | True | Heavy duty and reliable (Maybe) |
Usage
Here's an example of how to use PrepDataKit:
from prepdatakit import prepdatakit
import time
if __name__ == "__main__":
data = prepdatakit.read_file("reviews.csv")
# Reading the file
print("Data Information:")
print(prepdatakit.tabulate(data.head(), headers="keys", tablefmt="fancy_grid"))
print("\nData Type:", type(data))
print("Data Shape:", data.shape)
print("=" * 50)
# Generating summary
summary = prepdatakit.get_summary(data)
print("\nSummary Statistics:")
for key, value in summary.items():
print(key + ":")
if isinstance(value, prepdatakit.pd.DataFrame):
print(prepdatakit.tabulate(value, headers="keys", tablefmt="fancy_grid"))
elif isinstance(value, dict):
for k, v in value.items():
print(f" {k}: {v}")
print("-" * 50)
# Handling missing values
clean_data = prepdatakit.handle_missing_values(data, strategy="remove")
print("\nCleaned Data:")
# print(tabulate(clean_data.head(), headers='keys', tablefmt='fancy_grid'))
with open("clean_data.txt", "w", encoding="utf-8") as f:
f.write(prepdatakit.tabulate(clean_data, headers="keys", tablefmt="fancy_grid"))
print("\nData Type:", type(clean_data))
# Encoding categorical data
encoded_data = prepdatakit.one_hot_encode(clean_data)
print("\nEncoded Data:")
with open("encoded_data.txt", "w", encoding="utf-8") as f:
f.write(prepdatakit.tabulate(encoded_data, headers="keys", tablefmt="psql"))
# print(tabulate(encoded_data.head(), headers='keys', tablefmt='plain'))
print("\nData Type:", type(encoded_data))
print("Data Shape:", encoded_data.shape)
print("=" * 50)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prepdatakit-1.5.8.tar.gz.
File metadata
- Download URL: prepdatakit-1.5.8.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45951b95de874843e0af1748c6da9fa74283a931d01241472298e744842b4328
|
|
| MD5 |
93ce8068fc3820bc5e54e6ec3d85c77e
|
|
| BLAKE2b-256 |
a7573e372c26f50c5d1945c2463375b259cfccf86bc121ee1f89935c5c385189
|
File details
Details for the file prepdatakit-1.5.8-py3-none-any.whl.
File metadata
- Download URL: prepdatakit-1.5.8-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2912bc75742b992419912f76b0c76eacb4e6df78f72253209151950360b2e55
|
|
| MD5 |
dbdb70dfb6286d306d28e748544389b8
|
|
| BLAKE2b-256 |
f2c5536c789b31ea6cb4eec6267d015255c96dc1065f830c170623b10e5eb13e
|