Skip to main content

A lightweight Python package to clean CSV files

Project description

PyCSVDataCleaner

PyCSVDataCleaner is a simple Python package designed to clean CSV files. It helps you preprocess your data by:

  • Removing duplicate rows
  • Removing rows with missing values
  • Removing constant columns

The package is easy to use and works with CSV files containing any kind of data. It is ideal for automating the data cleaning process during your machine learning or data analysis workflow.


Features

  • Remove Duplicate Rows: Automatically removes duplicate rows from the dataset.
  • Remove Rows with Missing Values: Cleans your dataset by eliminating rows with empty cells.
  • Remove Constant Columns: Removes columns that contain constant values across all rows.

Installation

You can install PyCSVDataCleaner via pip:

pip install PyCSVDataCleaner

Usage

from PyCSVDataCleaner import PyCSVDataCleaner

input_file = 'path_to_your_input_file.csv'

output_file = 'path_to_your_output_file.csv'

CSVDataCleaner(input_file, output_file)

Example Output

When running the script, you'll get output in the terminal indicating how many rows and columns were removed or cleaned:

Cleaning file: fine_name.csv

--- Initial Data Info ---
Rows (excluding header): 129971
Columns: 14
Removed 0 duplicate rows.
Removed 107584 rows with missing values.
Removed 1 constant columns.

--- Cleaning Done ---
Final Rows: 22387
Final Columns: 13

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyCSVDataCleaner-0.1.0.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PyCSVDataCleaner-0.1.0-py3-none-any.whl (3.9 kB view details)

Uploaded Python 3

File details

Details for the file PyCSVDataCleaner-0.1.0.tar.gz.

File metadata

  • Download URL: PyCSVDataCleaner-0.1.0.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.5

File hashes

Hashes for PyCSVDataCleaner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4ac677c89de828752a2f7ddb75e23908bf398d82d96a4dd46df502ac09313105
MD5 93146373fe409c0e515d654edc48c095
BLAKE2b-256 9e700b6b89f72e51a1068651d37acb704f8f1eb22f9c0584edc70698eccce122

See more details on using hashes here.

File details

Details for the file PyCSVDataCleaner-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for PyCSVDataCleaner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9dca529d2f0a906f38b73fa1db5de249cb15293d6c91506e40d100175b826c9
MD5 49ad589da2623152de54fb456c720668
BLAKE2b-256 04976bbdc3c0e6dba8b896c6ebb4860203fb27685db785037a0e34706a1927b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page