Skip to main content

A lightweight Python package to clean CSV files

Project description

PyCSVDataCleaner

PyCSVDataCleaner is a simple Python package designed to clean CSV files. It helps you preprocess your data by:

  • Removing duplicate rows
  • Removing rows with missing values
  • Removing constant columns

The package is easy to use and works with CSV files containing any kind of data. It is ideal for automating the data cleaning process during your machine learning or data analysis workflow.


Features

  • Remove Duplicate Rows: Automatically removes duplicate rows from the dataset.
  • Remove Rows with Missing Values: Cleans your dataset by eliminating rows with empty cells.
  • Remove Constant Columns: Removes columns that contain constant values across all rows.

Installation

You can install PyCSVDataCleaner via pip:

pip install PyCSVDataCleaner

Usage

from PyCSVDataCleaner import PyCSVDataCleaner

input_file = 'path_to_your_input_file.csv'

output_file = 'path_to_your_output_file.csv'

PyCSVDataCleaner(input_file, output_file)

Example Output

When running the script, you'll get output in the terminal indicating how many rows and columns were removed or cleaned:

Cleaning file: fine_name.csv

--- Initial Data Info ---
Rows (excluding header): 129971
Columns: 14
Removed 0 duplicate rows.
Removed 107584 rows with missing values.
Removed 1 constant columns.

--- Cleaning Done ---
Final Rows: 22387
Final Columns: 13

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyCSVDataCleaner-0.1.1-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file PyCSVDataCleaner-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for PyCSVDataCleaner-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 17ace78564d801c34609b1a2036b6908c03d2638159da186f9d63701de281809
MD5 fba1a0d2fbce4bb7f2af747d19a81245
BLAKE2b-256 1c96708bb60da1a5948737edb8ce237ba1ce426090a382dce4729d81e69b3821

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page