Skip to main content

A comprehensive Python data processing tool for cleaning, visualization, and analysis

Project description

PyPI version

mizuio - Python Data Processing Toolkit

mizuio is a comprehensive Python toolkit for data cleaning, visualization, and analysis. It provides a modern command-line interface and Python API for efficient data workflows, leveraging Pandas, NumPy, Matplotlib, Seaborn, and scikit-learn.


🚀 Features

Data Cleaning (DataCleaner)

  • Handle missing values: drop, fill, or interpolate
  • Remove duplicates by columns
  • Automatic data type conversion
  • Outlier detection and removal (IQR, Z-score)
  • Text normalization (case, whitespace)

Data Visualization (DataVisualizer)

  • Histograms and distribution plots
  • Box plots for outlier analysis
  • Scatter plots for variable relationships
  • Correlation heatmaps
  • Bar and line charts (categorical/time series)
  • Missing value visualization

Utility Tools (DataUtils)

  • Multi-format support: CSV, JSON, Excel, Parquet, Pickle
  • Data validation (columns, types, value ranges)
  • Data sampling (random, systematic, stratified)
  • Data splitting (train/validation/test)
  • Categorical encoding (label, one-hot, ordinal)
  • Feature scaling (standard, minmax, robust)

📦 Installation

Requirements

  • Python 3.7+
  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

Steps

  1. Clone the repository:
    	git clone https://github.com/mertskzc/mizuio.git
    	cd mizuio
    
  2. Install dependencies:
    pip install -r requirements.txt
    
  3. Install in development mode (optional):
    pip install -e .
    

🖥️ Usage

Command Line Interface

mizuio provides a CLI for common data tasks:

# Clean a dataset
mizuio clean data.csv --output cleaned_data.csv --remove-duplicates --fill-missing --remove-outliers

# Visualize a column
mizuio visualize data.csv --plot histogram --column age --output age_hist.png

# Show data info
mizuio info data.csv

CLI Commands

  • clean: Clean data (remove duplicates, fill missing, remove outliers)
  • visualize: Visualize data (histogram, boxplot, scatter, correlation)
  • info: Show data summary (shape, memory, columns, missing values, duplicates)

🧪 Testing

Run all tests:

python -m pytest tests/

Run a specific test file:

python -m pytest tests/test_cleaner.py

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/your-feature)
  3. Commit your changes (git commit -m 'Add feature')
  4. Push your branch (git push origin feature/your-feature)
  5. Open a Pull Request

📝 License

This project is licensed under the MIT License. See the LICENSE file for details.


📞 Contact


🙏 Acknowledgements

mizuio uses the following open source libraries:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mizuio-0.1.1.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mizuio-0.1.1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file mizuio-0.1.1.tar.gz.

File metadata

  • Download URL: mizuio-0.1.1.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for mizuio-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e7361d06d9265b2f334a8378cfaa3d85738981493738fd6266feea0a0d7b1363
MD5 71b4c17e1986aa2e3f2f1a8d34200e67
BLAKE2b-256 070af596a636520106269cb77d4722fff83692c73b881218362f20b04c16b89b

See more details on using hashes here.

File details

Details for the file mizuio-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mizuio-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for mizuio-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca7b7181e49bf6e21bcfed7bac678fc6bdf412b0e5e2674d6901a61a26b79a91
MD5 ef2babb5514fbb31a3b086988cdfc2cd
BLAKE2b-256 20ae42fa36333f1c28321668ed300004b282b988c50a33e9a66813c67a6842f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page