A CLI tool to clean Excel/CSV data files and generating graphs.
Project description
TidyDataCLI
TidyDataCLI is a powerful command-line tool designed to streamline the process of cleaning and processing Excel/CSV data. It offers features such as removing duplicates, sanitizing data using regular expressions, and generating visual frequency plots. TidyDataCLI is cross-platform and can be easily run on any operating system, including via Docker, without needing to install Python.
Features
- Remove Duplicates: Easily remove duplicate entries from your dataset.
- Regex Cleaning: Sanitize your data by applying regular expressions to clean up unwanted patterns.
- Frequency Plots: Generate visual frequency plots for any column in your dataset.
- Cross-Platform Compatibility: Run on any platform, including via Docker.
- Support for Excel and CSV files: Seamlessly handle both
.csv
and.xlsx
files.
Installation
To install TidyDataCLI, simply run:
pip install TidyDataCLI
Usage
1. Remove Duplicates
To remove duplicate rows from a dataset:
tidydata remove_duplicates --input-file input.csv --output-file output.csv
You can also specify a subset of columns to check for duplicates:
tidydata remove_duplicates --input-file input.csv --output-file output.csv --subset column1,column2
2. Regex Cleaning
To clean your data using a regular expression:
tidydata regex_clean --input-file input.csv --output-file output.csv --pattern "\d+"
This will remove all numeric characters from your data.
3. Generate Frequency Plots
To generate a frequency plot for a specific column:
tidydata plot_frequency --input-file input.csv --column-name column_name --output-dir ./plots
The frequency plot will be saved as a .png
file in the specified output directory.
Example
Given a CSV file data.csv
:
Name, Age, Country
Alice, 29, USA
Bob, 32, Canada
Alice, 29, USA
Removing Duplicates
Command:
tidydata remove_duplicates --input-file data.csv --output-file cleaned_data.csv
Output (cleaned_data.csv
):
Name, Age, Country
Alice, 29, USA
Bob, 32, Canada
Regex Cleaning
Command:
tidydata regex_clean --input-file data.csv --output-file cleaned_data.csv --pattern "\d"
Output (cleaned_data.csv
):
Name, Age, Country
Alice, , USA
Bob, , Canada
Alice, , USA
Generating Frequency Plot
Command:
tidydata plot_frequency --input-file data.csv --column-name Country --output-dir ./plots
This generates a bar plot showing the frequency of each country in the dataset, saved in the ./plots
directory.
Docker Support
If you prefer not to install Python or other dependencies, you can use TidyDataCLI with Docker:
docker run -v $(pwd):/data tidydatacli tidydata <command> --input-file /data/input.csv --output-file /data/output.csv
Contributing
Contributions are welcome! Please feel free to submit a pull request or open an issue.
License
TidyDataCLI is licensed under the MIT License. See the LICENSE
file for more details.
Contact
For any questions or issues, please contact Siama at [siamaphilbert@outlook.com].
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tidydatacli-0.1.0.tar.gz
.
File metadata
- Download URL: tidydatacli-0.1.0.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb22ebba9df0307a0c2faa9618940b741bc2d8c567f52ed4b731ae85457f2e03 |
|
MD5 | 0aa0776b3043b7f6f56fb271a2c1cf95 |
|
BLAKE2b-256 | 81238dbf8ec4f7e7e72885e3d5c7151b9791f76e7a0b57ce19369d4896578171 |
File details
Details for the file TidyDataCLI-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: TidyDataCLI-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e61eec205df85500e4c261cbe3bbd790284379512f48c82766412217eb158fa6 |
|
MD5 | 39159f29195a16b80edccf6ca20d67b5 |
|
BLAKE2b-256 | 01644e8c0f449082074a377e9d6404eb5980243acf4472acce33f89df3897fa8 |