Zero-friction AutoML + Data Cleaning Toolkit
Project description
🚀 KaizenStat
Official Website: www.kaizenstat.com
KaizenStat is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.
🎯 Core Philosophy
- Zero-Friction AutoML: No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
- Production Crash-Proofing: Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
- Explainable AI: Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
- Hybrid Interface: 100% parity between CLI and Python API.
📦 Installation
Install the core package with zero heavy external dependencies:
pip install kaizenstat
Optional Drivers & Accelerators
Tailor KaizenStat to your specific workload:
pip install kaizenstat[ui] # Install Streamlit for web dashboards
pip install kaizenstat[gpu] # Install XGBoost with GPU/MPS support
pip install kaizenstat[fast] # Install Polars for ultra-fast CSV parsing
pip install kaizenstat[all] # Install all optional components
⚔️ CLI & Python API Feature Matrix
KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.
| Command | Python API | Purpose |
|---|---|---|
kz audit |
KaizenStat.audit() |
🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
kz heal |
KaizenStat.heal() |
🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
kz benchmark |
KaizenStat.benchmark() |
🚀 Automatically trains, optimizes, and ranks model pipelines. |
kz auto |
KaizenStat.auto() |
⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
kz explain |
KaizenStat.explain() |
💬 Generates plain-English diagnostic summaries and model recommendations. |
kz codegen |
KaizenStat.codegen() |
📝 Generates standalone, dependency-free Python code for the best model. |
kz export-model |
KaizenStat.save_model() |
💾 Trains the top pipeline and saves it directly to a .joblib binary. |
kz report |
KaizenStat.report() / KaizenStat.serve_report() |
📊 Generates HTML report (auto-opens browser) and serves it on a local web port (with --serve or serve_report()). |
kz serve |
KaizenStat.serve() |
🖥️ Launches a local interactive Streamlit app dashboard. |
kz analyze |
KaizenStat.analyze() |
🧠 Executes auto-intelligence analysis over dataset context using LLM reasoning. |
kz ask |
KaizenStat.ask() |
🤖 Answers complex developer queries about accuracy, data quality, or anomalies. |
kz ask --followup |
KaizenStat.ask_followup() |
🔁 Maintains multi-turn conversation memory with the data reasoning engine. |
kz improve |
KaizenStat.improve() |
🚀 Query AI to get next best actions and improvement plans. |
kz status |
N/A (CLI Only) |
📊 Show active system and dataset context status. |
kz reset |
N/A (CLI Only) |
🧹 Reset conversational memory and active dataset context. |
💡 Quick Start Guide
1. Python SDK Usage
from kaizenstat import KaizenStat
import pandas as pd
# Load your dataset
df = pd.read_csv("dataset.csv")
# 1. Diagnose issues
findings = KaizenStat.audit(df, target="target_column")
# 2. Automatically heal dirty data
clean_df = KaizenStat.heal(df, target="target_column")
# 3. Benchmark models with cross-validation
leaderboard = KaizenStat.benchmark(clean_df, target="target_column")
# 4. Generate standalone code for reproduction
KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
# 5. Generate and open interactive HTML profiling report
# By default, report() automatically opens in your default browser.
report_path = KaizenStat.report("dataset.csv", target="target_column")
# Or serve the HTML report temporarily on a local web port (great for remote/headless setups)
KaizenStat.serve_report(report_path)
# 6. Dual-Mode Conversational AI (OpenRouter powered)
# Runs automated structured AI analysis
analysis = KaizenStat.analyze(df, target="target_column")
# Ask custom developer queries about data or pipeline
KaizenStat.ask("Why is model accuracy lower or what are the dataset flaws?")
# Multi-turn conversation with memory context
KaizenStat.ask_followup("What should I do to handle the missing values or high cardinality?")
# 6. Get actionable next-step recommendations
KaizenStat.improve()
2. Command Line Interface (CLI)
# Get quick help and list commands
kz --help
# Run the full pipeline in one command
kz auto dataset.csv --target target_column
# Repair a dataset and save the clean file
kz heal dataset.csv --target target_column -o cleaned_dataset.csv
# Launch a local Streamlit app to preview and test model performance
kz serve dataset.csv --target target_column --port 8501
# Generate HTML report and automatically open it in your browser
kz report dataset.csv target_column
# Generate HTML report and serve it on a temporary local port with live web hosting
kz report dataset.csv target_column --serve
# Execute AI diagnostic analysis (saves context locally)
kz analyze dataset.csv --target target_column
# Ask conversational queries about data quality
kz ask "Why is model accuracy low?"
# Ask followup query with conversation memory persistence
kz ask "What should I do to handle the missing values?" --followup
# Get next best actions / actionable improvement plan
kz improve
# View active system and dataset context status
kz status
# Reset conversational memory and session cache
kz reset
🧠 Behind the Scenes: Core Engines
1. Hardware-Aware Execution
KaizenStat automatically checks your environment using detect_device(). It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like xgboost) are installed.
2. Smart Model Selection
The benchmarking engine adjusts its logic dynamically based on the dataset properties:
- Large Datasets (>100k rows): Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
- High-Cardinality Categoricals: Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
- Float Targets: Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.
3. Automatic Imbalance Correction
During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than 65% / 35%, it adjusts model parameters dynamically (e.g. setting class_weight="balanced" in scikit-learn estimators).
🛠 Developer Guide
Setting up a local workspace
To contribute or run local enhancements:
- Clone the repository:
git clone https://github.com/masuddarrahaman/KaizenStat-Library.git cd KaizenStat-Library
- Install the package in editable mode with all optional drivers:
pip install -e ".[all]"
- Run tests or validation:
python3 -m unittest discover -s tests
📄 License
Distributed under the MIT License. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaizenstat-0.2.13.tar.gz.
File metadata
- Download URL: kaizenstat-0.2.13.tar.gz
- Upload date:
- Size: 27.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1bcc5736c2873b9dc98ccc85fa2a738ebc5c47a8e07da93828b5638cf52998f
|
|
| MD5 |
8a383ffe809925bad6416d56d1461523
|
|
| BLAKE2b-256 |
e421cd51fef74173e64932d4b1f57bf2f3f7b3d6297706543f0a1f755295b229
|
File details
Details for the file kaizenstat-0.2.13-py3-none-any.whl.
File metadata
- Download URL: kaizenstat-0.2.13-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4aa0170b3b42974b70d2b63c31eef2e8c8b4a99d37ce1ad20348f9863b47903
|
|
| MD5 |
21c11d29e402321f2c124e591f44a1b8
|
|
| BLAKE2b-256 |
71005dfe36206ace9970f0b3c2e21574fae9172c3879915d8e87720238b9e786
|