Skip to main content

A CLI tool to clean Excel/CSV data files and generating graphs.

Project description

TidyDataCLI

TidyDataCLI is a powerful command-line tool designed to streamline the process of cleaning and processing Excel/CSV data. It offers features such as removing duplicates, sanitizing data using regular expressions, and generating visual frequency plots. TidyDataCLI is cross-platform and can be easily run on any operating system, including via Docker, without needing to install Python.

Features

  • Remove Duplicates: Easily remove duplicate entries from your dataset.
  • Regex Cleaning: Sanitize your data by applying regular expressions to clean up unwanted patterns.
  • Frequency Plots: Generate visual frequency plots for any column in your dataset.
  • Cross-Platform Compatibility: Run on any platform, including via Docker.
  • Support for Excel and CSV files: Seamlessly handle both .csv and .xlsx files.

Installation

To install TidyDataCLI, simply run:

pip install TidyDataCLI

Usage

1. Remove Duplicates

To remove duplicate rows from a dataset:

tidydata remove_duplicates --input-file input.csv --output-file output.csv

You can also specify a subset of columns to check for duplicates:

tidydata remove_duplicates --input-file input.csv --output-file output.csv --subset column1,column2

2. Regex Cleaning

To clean your data using a regular expression:

tidydata regex_clean --input-file input.csv --output-file output.csv --pattern "\d+"

This will remove all numeric characters from your data.

3. Generate Frequency Plots

To generate a frequency plot for a specific column:

tidydata plot_frequency --input-file input.csv --column-name column_name --output-dir ./plots

The frequency plot will be saved as a .png file in the specified output directory.

Example

Given a CSV file data.csv:

Name, Age, Country
Alice, 29, USA
Bob, 32, Canada
Alice, 29, USA

Removing Duplicates

Command:

tidydata remove_duplicates --input-file data.csv --output-file cleaned_data.csv

Output (cleaned_data.csv):

Name, Age, Country
Alice, 29, USA
Bob, 32, Canada

Regex Cleaning

Command:

tidydata regex_clean --input-file data.csv --output-file cleaned_data.csv --pattern "\d"

Output (cleaned_data.csv):

Name, Age, Country
Alice, , USA
Bob, , Canada
Alice, , USA

Generating Frequency Plot

Command:

tidydata plot_frequency --input-file data.csv --column-name Country --output-dir ./plots

This generates a bar plot showing the frequency of each country in the dataset, saved in the ./plots directory.

Docker Support

If you prefer not to install Python or other dependencies, you can use TidyDataCLI with Docker:

docker run -v $(pwd):/data tidydatacli tidydata <command> --input-file /data/input.csv --output-file /data/output.csv

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

License

TidyDataCLI is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or issues, please contact Siama at [siamaphilbert@outlook.com].

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidydatacli-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

TidyDataCLI-0.1.0-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file tidydatacli-0.1.0.tar.gz.

File metadata

  • Download URL: tidydatacli-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for tidydatacli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bb22ebba9df0307a0c2faa9618940b741bc2d8c567f52ed4b731ae85457f2e03
MD5 0aa0776b3043b7f6f56fb271a2c1cf95
BLAKE2b-256 81238dbf8ec4f7e7e72885e3d5c7151b9791f76e7a0b57ce19369d4896578171

See more details on using hashes here.

File details

Details for the file TidyDataCLI-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: TidyDataCLI-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for TidyDataCLI-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e61eec205df85500e4c261cbe3bbd790284379512f48c82766412217eb158fa6
MD5 39159f29195a16b80edccf6ca20d67b5
BLAKE2b-256 01644e8c0f449082074a377e9d6404eb5980243acf4472acce33f89df3897fa8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page