A comprehensive Python data processing tool for cleaning, visualization, and analysis
Project description
mizuio - Python Data Processing Toolkit
mizuio is a comprehensive Python toolkit for data cleaning, visualization, and analysis. It provides a modern command-line interface and Python API for efficient data workflows, leveraging Pandas, NumPy, Matplotlib, Seaborn, and scikit-learn.
🚀 Features
Data Cleaning (DataCleaner)
- Handle missing values: drop, fill, or interpolate
- Remove duplicates by columns
- Automatic data type conversion
- Outlier detection and removal (IQR, Z-score)
- Text normalization (case, whitespace)
Data Visualization (DataVisualizer)
- Histograms and distribution plots
- Box plots for outlier analysis
- Scatter plots for variable relationships
- Correlation heatmaps
- Bar and line charts (categorical/time series)
- Missing value visualization
Utility Tools (DataUtils)
- Multi-format support: CSV, JSON, Excel, Parquet, Pickle
- Data validation (columns, types, value ranges)
- Data sampling (random, systematic, stratified)
- Data splitting (train/validation/test)
- Categorical encoding (label, one-hot, ordinal)
- Feature scaling (standard, minmax, robust)
📦 Installation
Requirements
- Python 3.7+
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
Steps
- Clone the repository:
git clone https://github.com/mertskzc/mizuio.git cd mizuio
- Install dependencies:
pip install -r requirements.txt
- Install in development mode (optional):
pip install -e .
🖥️ Usage
Command Line Interface
mizuio provides a CLI for common data tasks:
# Clean a dataset
mizuio clean data.csv --output cleaned_data.csv --remove-duplicates --fill-missing --remove-outliers
# Visualize a column
mizuio visualize data.csv --plot histogram --column age --output age_hist.png
# Show data info
mizuio info data.csv
CLI Commands
clean: Clean data (remove duplicates, fill missing, remove outliers)visualize: Visualize data (histogram, boxplot, scatter, correlation)info: Show data summary (shape, memory, columns, missing values, duplicates)
🧪 Testing
Run all tests:
python -m pytest tests/
Run a specific test file:
python -m pytest tests/test_cleaner.py
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'Add feature') - Push your branch (
git push origin feature/your-feature) - Open a Pull Request
📝 License
This project is licensed under the MIT License. See the LICENSE file for details.
📞 Contact
- Project Link: https://github.com/mertskzc/mizuio
- E-mail: mertskzc@gmail.com
🙏 Acknowledgements
mizuio uses the following open source libraries:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mizuio-0.1.1.tar.gz.
File metadata
- Download URL: mizuio-0.1.1.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7361d06d9265b2f334a8378cfaa3d85738981493738fd6266feea0a0d7b1363
|
|
| MD5 |
71b4c17e1986aa2e3f2f1a8d34200e67
|
|
| BLAKE2b-256 |
070af596a636520106269cb77d4722fff83692c73b881218362f20b04c16b89b
|
File details
Details for the file mizuio-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mizuio-0.1.1-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca7b7181e49bf6e21bcfed7bac678fc6bdf412b0e5e2674d6901a61a26b79a91
|
|
| MD5 |
ef2babb5514fbb31a3b086988cdfc2cd
|
|
| BLAKE2b-256 |
20ae42fa36333f1c28321668ed300004b282b988c50a33e9a66813c67a6842f5
|