A lightweight Python package to clean CSV files
Project description
PyCSVDataCleaner
PyCSVDataCleaner is a simple Python package designed to clean CSV files. It helps you preprocess your data by:
- Removing duplicate rows
- Removing rows with missing values
- Removing constant columns
The package is easy to use and works with CSV files containing any kind of data. It is ideal for automating the data cleaning process during your machine learning or data analysis workflow.
Features
- Remove Duplicate Rows: Automatically removes duplicate rows from the dataset.
- Remove Rows with Missing Values: Cleans your dataset by eliminating rows with empty cells.
- Remove Constant Columns: Removes columns that contain constant values across all rows.
Installation
You can install PyCSVDataCleaner via pip:
pip install PyCSVDataCleaner
Usage
from PyCSVDataCleaner import PyCSVDataCleaner
input_file = 'path_to_your_input_file.csv'
output_file = 'path_to_your_output_file.csv'
CSVDataCleaner(input_file, output_file)
Example Output
When running the script, you'll get output in the terminal indicating how many rows and columns were removed or cleaned:
Cleaning file: fine_name.csv
--- Initial Data Info ---
Rows (excluding header): 129971
Columns: 14
Removed 0 duplicate rows.
Removed 107584 rows with missing values.
Removed 1 constant columns.
--- Cleaning Done ---
Final Rows: 22387
Final Columns: 13
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file PyCSVDataCleaner-0.1.0.tar.gz.
File metadata
- Download URL: PyCSVDataCleaner-0.1.0.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ac677c89de828752a2f7ddb75e23908bf398d82d96a4dd46df502ac09313105
|
|
| MD5 |
93146373fe409c0e515d654edc48c095
|
|
| BLAKE2b-256 |
9e700b6b89f72e51a1068651d37acb704f8f1eb22f9c0584edc70698eccce122
|
File details
Details for the file PyCSVDataCleaner-0.1.0-py3-none-any.whl.
File metadata
- Download URL: PyCSVDataCleaner-0.1.0-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9dca529d2f0a906f38b73fa1db5de249cb15293d6c91506e40d100175b826c9
|
|
| MD5 |
49ad589da2623152de54fb456c720668
|
|
| BLAKE2b-256 |
04976bbdc3c0e6dba8b896c6ebb4860203fb27685db785037a0e34706a1927b0
|