Skip to main content

A CLI tool to automate cleaning, transformation and visualisation of Excel/CSV data.

Project description

TidyDataCLI Logo

TidyDataCLI

Overview

GitHub stars PyPI version GitHub forks GitHub issues GitHub license

TidyDataCLI is a robust command-line tool built for automating the process of cleaning, transforming, and visualizing Excel/CSV data. Designed to be cross-platform, it can run seamlessly on Linux, macOS, and Windows, and can even be used through Docker without requiring Python to be installed.

Why use TidyDataCLI?
With its wide range of features, TidyDataCLI simplifies complex data tasks, offering tools for:

  • Data Cleaning: Remove duplicates, standardize column names, trim spaces, validate ages, and much more.
  • Data Transformation: Sort, filter, apply custom transformations, and aggregate data effortlessly.
  • Visualization: Generate professional-grade charts like bar charts, word clouds, heat maps, and Gantt charts.
  • Report Generation: Create detailed PDF or text reports directly from your data files.

Features

Data Cleaning

- Remove Duplicates: Efficiently remove duplicate entries from your dataset.
- Regex Cleaning: Sanitize data using customizable regular expressions.
- Column Name Cleaning: Standardize column names by stripping spaces and converting to lowercase.
- Trim Spaces: Remove leading and trailing spaces from string columns.
- Age Validation: Validate and clean 'age' columns to ensure data integrity.
- Change Case: Convert text columns to lowercase, uppercase, title case, or capitalize.
- Date Standardization: Standardize date formats across specified columns.

Data Transformation

- Sorting: Sort data by one or more columns with ascending or descending options.
- Filtering: Apply conditions to filter rows based on specified criteria.
- Custom Transformations: Apply user-defined lambda functions for complex transformations.
- Column Addition: Add values to existing columns and perform arithmetic operations.
- Aggregation: Aggregate data by summing, averaging, or counting grouped values.

Visualization

- Bar Charts: Generate bar charts with customizable x and y axes.
- Pie Charts: Create pie charts with labels and values for visualization.
- Word Clouds: Visualize text data using word clouds.
- Line Charts: Plot line charts for trend analysis.
- Box-and-Whisker Plots: Create box plots to analyze data distributions.
- Gantt Charts: Visualize project timelines with Gantt charts.
- Heat Maps: Generate heat maps to represent data density.
- Histograms: Plot histograms with adjustable bin sizes.
- Tree Maps: Visualize hierarchical data using tree maps.

Report Generation

Cross-Platform

  - Runs on Linux, macOS, and Windows and Docker Environments

Table of Contents


Installation

Requirements

  • Python 3.7+
  • Pip (Python package manager)
  • Docker (Optional, for containerized execution)

Install via pip

pip install TidyDataCLI

Install from Source

  1. Clone the repository:
    git clone https://github.com/Siam3h/tidydatacli.git
    
  2. Navigate to the directory:
    cd tidydatacli
    
  3. Install the package:
    pip install .
    

Running with Docker

For a containerized approach:

  1. Pull the Docker image:
    docker pull tidydatacli
    
  2. Run TidyDataCLI via Docker:
    docker run -v $(pwd):/data tidydatacli tidydata <command> --input /data/input.csv --output /data/output.csv
    

Usage

Once installed, TidyDataCLI can be invoked using the following syntax:

tidydata <command> [options]

Example Commands

Cleaning Data:

tidydata clean --input data.csv --output cleaned_data.csv --remove_duplicates --clean_columns

Transforming Data:

tidydata transform --input data.csv --output transformed.csv --sort column1 --filter "age > 30"

Visualizing Data:

tidydata visualize --input data.csv --type bar --x category --y sales --output bar_chart.png

Generating Reports:

tidydata report --input data.csv --output report.pdf --format pdf --summary

Commands Overview

1. clean

Clean your dataset by removing duplicates, trimming spaces, or performing regex-based cleaning.

2. transform

Apply transformations such as sorting, filtering, adding columns, and custom lambda functions.

3. visualize

Create visual representations of your data, such as bar charts, pie charts, and word clouds.

4. report

Generate reports in text or PDF format with customizable summaries or detailed outputs.

Running with Docker

To avoid dependency management, you can use Docker:

docker run -v $(pwd):/data tidydatacli tidydata clean --input /data/input.csv --output /data/output.csv

Error Handling

Error messages are displayed for common issues like file not found, invalid columns, or missing options.

Example error:

Error: Input file 'non_existent_file.csv' not found.

Contributing

We welcome contributions!

  1. Fork the repository.
  2. Create a new branch.
  3. Make your changes and submit a pull request.

Find issues or suggestions? Please open an issue on GitHub.

License

TidyDataCLI is licensed under the MIT License. See the LICENSE file for more details.

Contact

For any questions or issues, please contact Siama at siamaphilbert@outlook.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tidydatacli-0.2.5.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

TidyDataCLI-0.2.5-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file tidydatacli-0.2.5.tar.gz.

File metadata

  • Download URL: tidydatacli-0.2.5.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for tidydatacli-0.2.5.tar.gz
Algorithm Hash digest
SHA256 3e5593032ea1fc57513d3b0b0b981fcb33143166ca474cebe1b30fccd2cdd191
MD5 0ad4edd0a3ccbf513b249ae111bf510d
BLAKE2b-256 1660b65cb5c00cba40c245657e0fcd5ef36fe4bb6a460645e7b5d2a8e89dc1ab

See more details on using hashes here.

File details

Details for the file TidyDataCLI-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: TidyDataCLI-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for TidyDataCLI-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b1b7aba8f7e05b563ab292f4251cd769cd6ba0fbc61e4f2a7dc855fd4066b700
MD5 a670d8a02a47b91047bde1232ad8c1e4
BLAKE2b-256 26596efb4a954acd598f4d531f0d7dd1a01faacf89054647d6a0b64da26dd3fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page