Zero-friction AutoML + Data Cleaning Toolkit
Project description
🚀 KaizenStat
KaizenStat is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit designed to fit right into your daily data science workflow. It helps you diagnose and repair dataset issues instantly and trains baseline models to give you immediate insights.
✨ Features
- 🔍
kz.audit(): Instantly sweep datasets for duplicates, NaNs, infs, constant columns, and target label integrity. - 🩹
kz.heal(): Automatically clean datasets by repairing missing targets, removing duplicates, dropping dead/constant columns, and imputing missing data using mean, median, or mode. - 🚀
kz.benchmark(): Auto-detects objectives (classification/regression), builds pre-processing pipelines, trains elite models (Linear/Ridge, RandomForest, Neural Networks), and ranks them on a beautiful leaderboard. - 💻 CLI Interface: Command line utility (
kz) to audit, heal, or benchmark CSV datasets directly from the terminal.
📦 Installation
Install KaizenStat from PyPI:
pip install kaizenstat
Or install it locally in editable mode for development:
pip install -e .
🚀 Quickstart Usage
Python API
import pandas as pd
from kaizenstat import KaizenStat
# Load dataset
df = pd.read_csv("data.csv")
# 1. Audit dataset
KaizenStat.audit(df, target_column="target")
# 2. Automatically repair dataset issues
clean_df = KaizenStat.heal(df, target_column="target", method="fill_median")
# 3. Benchmark ML models
leaderboard = KaizenStat.benchmark(clean_df, target_column="target")
💻 Command Line Interface (CLI)
KaizenStat provides a powerful CLI tool named kz right out of the box:
Audit a dataset:
kz audit data.csv --target price
Heal a dataset:
kz heal data.csv --target price --method fill_median -o clean_data.csv
Benchmark a dataset:
kz benchmark clean_data.csv --target price
🛠 Development and Packaging
Build the package using build:
pip install build twine
python -m build
Upload to PyPI:
twine upload dist/*
📄 License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaizenstat-0.1.0.tar.gz.
File metadata
- Download URL: kaizenstat-0.1.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bea90a8a01089686263238fd722bcf904bd474e5869b0b8d530e8df982f252b4
|
|
| MD5 |
9d35ceeab3a4397da3b2f2cf70e372d5
|
|
| BLAKE2b-256 |
bfffd7cfc3c3dea588536df811ad85f52d1f36f0001f80c8a182c141c879bb44
|
File details
Details for the file kaizenstat-0.1.0-py3-none-any.whl.
File metadata
- Download URL: kaizenstat-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b46adfe383d6a53b171111a9362be01880f3943f2092eb9e3be02d259de39f93
|
|
| MD5 |
ef8309136bca8ee22c282fbed0331957
|
|
| BLAKE2b-256 |
5ed3648cca61171f94cf88481267922d5dd9dbf0d1f1c1831586116f37a06b57
|