Skip to main content

Zero-friction AutoML + Data Cleaning Toolkit

Project description

🚀 KaizenStat

PyPI Version License: MIT Python Version

KaizenStat is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit designed to fit right into your daily data science workflow. It helps you diagnose and repair dataset issues instantly and trains baseline models to give you immediate insights.


✨ Features

  • 🔍 kz.audit(): Instantly sweep datasets for duplicates, NaNs, infs, constant columns, and target label integrity.
  • 🩹 kz.heal(): Automatically clean datasets by repairing missing targets, removing duplicates, dropping dead/constant columns, and imputing missing data using mean, median, or mode.
  • 🚀 kz.benchmark(): Auto-detects objectives (classification/regression), builds pre-processing pipelines, trains elite models (Linear/Ridge, RandomForest, Neural Networks), and ranks them on a beautiful leaderboard.
  • 💻 CLI Interface: Command line utility (kz) to audit, heal, or benchmark CSV datasets directly from the terminal.

📦 Installation

Install KaizenStat from PyPI:

pip install kaizenstat

Or install it locally in editable mode for development:

pip install -e .

🚀 Quickstart Usage

Python API

import pandas as pd
from kaizenstat import KaizenStat

# Load dataset
df = pd.read_csv("data.csv")

# 1. Audit dataset
KaizenStat.audit(df, target_column="target")

# 2. Automatically repair dataset issues
clean_df = KaizenStat.heal(df, target_column="target", method="fill_median")

# 3. Benchmark ML models
leaderboard = KaizenStat.benchmark(clean_df, target_column="target")

💻 Command Line Interface (CLI)

KaizenStat provides a powerful CLI tool named kz right out of the box:

Audit a dataset:

kz audit data.csv --target price

Heal a dataset:

kz heal data.csv --target price --method fill_median -o clean_data.csv

Benchmark a dataset:

kz benchmark clean_data.csv --target price

🛠 Development and Packaging

Build the package using build:

pip install build twine
python -m build

Upload to PyPI:

twine upload dist/*

📄 License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaizenstat-0.1.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaizenstat-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file kaizenstat-0.1.0.tar.gz.

File metadata

  • Download URL: kaizenstat-0.1.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for kaizenstat-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bea90a8a01089686263238fd722bcf904bd474e5869b0b8d530e8df982f252b4
MD5 9d35ceeab3a4397da3b2f2cf70e372d5
BLAKE2b-256 bfffd7cfc3c3dea588536df811ad85f52d1f36f0001f80c8a182c141c879bb44

See more details on using hashes here.

File details

Details for the file kaizenstat-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kaizenstat-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for kaizenstat-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b46adfe383d6a53b171111a9362be01880f3943f2092eb9e3be02d259de39f93
MD5 ef8309136bca8ee22c282fbed0331957
BLAKE2b-256 5ed3648cca61171f94cf88481267922d5dd9dbf0d1f1c1831586116f37a06b57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page