Skip to main content

Zero-friction AutoML + Data Cleaning Toolkit

Project description

🚀 KaizenStat

PyPI Version License: MIT Python Version

KaizenStat is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit. Diagnose datasets instantly, auto-repair issues, train baseline models, generate standalone Python code, and launch interactive dashboards — all in one command.


✨ Features

Command What it does
kz audit 🔍 Diagnostic sweep — duplicates, NaNs, infs, ID columns, imbalance
kz heal 🩹 Auto-clean — impute, deduplicate, drop dead columns
kz benchmark 🚀 Train & rank ML models with cross-validation
kz auto ⚡ Full pipeline in one command (audit → heal → benchmark)
kz explain 💬 Plain-English summary of findings and recommendations
kz codegen 📝 Generate a standalone Python training script
kz export-model 💾 Train best model and save to .joblib
kz report 📊 Generate interactive HTML report with charts
kz serve 🌐 Launch interactive Streamlit web dashboard

📦 Installation

pip install kaizenstat

Optional extras:

pip install kaizenstat[ui]     # + Streamlit dashboard
pip install kaizenstat[gpu]    # + XGBoost GPU support
pip install kaizenstat[fast]   # + Polars fast data loading
pip install kaizenstat[all]    # everything

🚀 Quick Start

Python API

from kaizenstat import KaizenStat

# Full pipeline in one call
KaizenStat.auto("data.csv", target="price")

# Or step-by-step
import pandas as pd
df = pd.read_csv("data.csv")

KaizenStat.audit(df, target="price")
df_clean = KaizenStat.heal(df, target="price")
results = KaizenStat.benchmark(df_clean, target="price")

💬 Get a Plain-English Explanation

KaizenStat.explain("data.csv", target="price")

📝 Generate Standalone Code

KaizenStat.codegen("data.csv", target="price", output_path="deploy.py")

💾 Export & Load Models

# Train + save
KaizenStat.auto("data.csv", target="price")
KaizenStat.save_model(path="model.joblib")

# Load later
pipeline = KaizenStat.load_model("model.joblib")
predictions = pipeline.predict(new_data)

📊 Generate HTML Report

KaizenStat.report("data.csv", target="price", output_path="report.html")

🌐 Launch Web Dashboard

KaizenStat.serve("data.csv", target="price")

💻 CLI Usage

# Diagnostic sweep
kz audit data.csv --target price

# Auto-clean dataset
kz heal data.csv --target price -o clean.csv

# Train & rank models
kz benchmark clean.csv --target price

# Full pipeline
kz auto data.csv --target price

# Plain-English summary
kz explain data.csv --target price

# Generate standalone Python script
kz codegen data.csv --target price -o deploy.py

# Train best model and export
kz export-model data.csv --target price -o model.joblib

# Generate interactive HTML report
kz report data.csv --target price -o report.html

# Launch web dashboard
kz serve data.csv --target price

🛠 Development

git clone https://github.com/yourusername/kaizenstat.git
cd kaizenstat
pip install -e ".[all]"

📄 License

Distributed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaizenstat-0.2.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaizenstat-0.2.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file kaizenstat-0.2.0.tar.gz.

File metadata

  • Download URL: kaizenstat-0.2.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for kaizenstat-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2d8160dad9e8de3bc287376d013b4d7e29028d1cf6af1a74e8e0b60346ab1a3e
MD5 94d38ab32f179b062e0f4eaf08dc028d
BLAKE2b-256 b20cb3c8d6f36984003148ce4e421fadf4cb5c1d5cf89a442ce143ee71ac797a

See more details on using hashes here.

File details

Details for the file kaizenstat-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kaizenstat-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for kaizenstat-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1228d4b171f9db56bd161cffec0be2393186f454b414a582c9d25b0ce6f41a5
MD5 e468a083453dca9d7c0c086fee39600f
BLAKE2b-256 1a2b2e00f64f8c8d0e53b64632048e255dc82b27d62955479028504712a2dae4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page