Deep Insights EDA — Comprehensive data profiling with global AI techniques

Project description

🔬 Khadee EDA — Deep Insights Data Profiling & Cleaning

khadee-eda is a next-generation, high-performance exploratory data analysis (EDA) and data cleaning library. It generates stunning, glassmorphism-themed interactive HTML profiling reports from any dataset and provides a robust, lightweight suite of cleaning tools equivalent to dataprep.clean.

[!TIP] Learning Data Science & EDA? We have created a comprehensive, beginner-friendly Learning & Reference Guide explaining all standard library functions (NumPy, Pandas, SciPy, Scikit-Learn) and advanced regional statistics used in this project!

⚡ Quick Start

1. Generating a Profiling Report

Auto-detects and loads data from CSV, Excel, JSON, Parquet, SQLite, and 10+ other formats.

from k_eda import ProfileReport

# Method A: Direct one-liner from file path
report = ProfileReport("train.csv", title="E-Commerce Analysis")
report.to_html("report.html")

# Method B: From an existing Pandas DataFrame
import pandas as pd
df = pd.read_csv("train.csv")
report = ProfileReport(df, title="Customer Profiles")
report.to_html("report.html")

2. High-Performance Data Cleaning (`k_eda.clean`)

Direct, unified API for cleaning, standardizing, and preparing data (a lightweight alternative to dataprep.clean).

from k_eda import clean

# Clean column headers (standardize to snake_case, PascalCase, camelCase, etc.)
df_clean = clean.clean_headers(df, case="snake")

# Impute missing values with mean, median, mode, or constant value
df_clean = clean.clean_missing(df_clean, columns=["age", "income"], strategy="median")

# Handle outliers by clipping (winsorization) or dropping rows
df_clean = clean.clean_outliers(df_clean, columns=["fare"], method="iqr", strategy="clip")

# Normalize and clean text columns (strip spaces, lowercase, remove special characters)
df_clean = clean.clean_text(df_clean, columns=["product_desc"], lowercase=True, remove_special=True)

# Remove duplicate rows
df_clean = clean.clean_duplicates(df_clean, columns=["user_id"])

# Run a complete, standard cleanup pass
df_clean = clean.clean_df(df)

📊 10 Structured Analysis Sections

Each HTML report is divided into 10 structured, deeply interactive sections:

🏠 Overview: High-level dataset shapes, reproduction metadata, alerts (missing cells, zero values, extreme correlations, duplicates), and detected data types.
📊 Variables (Interactive Dropdown Explorer): Detailed statistics per variable (quantiles, descriptives, frequencies, categories). Includes a custom select dropdown menu to show/hide column details and dynamically resize Plotly visualizations.
📈 Distributions: Visual analysis of distributions via histogram grids, kernel density estimations (KDE), skewness, kurtosis, and normality tests.
🔗 Correlations: Pairwise comparison matrices using Pearson, Spearman, Kendall, and Cramér's V metrics represented as interactive heatmaps.
❓ Missing Values: Visual representation of missing data via matrices, counts, and imputation recommendations.
🎯 Outliers: Deep outlier diagnostic detailing detection using IQR, Z-score, Median Absolute Deviation (MAD), and Isolation Forest.
🔄 Interactions: Interactive bivariate scatter plots and grouped box plots.
📐 Advanced Stats (Global AI Hub Methodologies): Unique statistical and machine learning frameworks tailored after analytical cultures across the globe (see below).
🤖 Model Readiness: Preprocessing checklists, ML model suitability rankings, and code recommendation generators.
📋 Sample: Interactive data table viewer showing the head, tail, duplicates, and data dictionary.

🌍 Global EDA Techniques

The Advanced Stats section includes 4 distinct regional analytical philosophies:

🇺🇸 US (ML-Readiness & Feature Engineering): Identifies feature importance, flags target leakage, and proposes engineered features.
🇮🇳 India (Statistical Foundations & Hypothesis Testing): Evaluates confidence intervals, conducts hypothesis testing, and fits target distributions.
🇯🇵 Japan (Quality Control & Process Analytics — Kaizen): Implements Shewhart control charts, calculates Process Capability Indexes ($C_p$/$C_{pk}$), checks stability indicators, and generates Pareto charts.
🇨🇳 China (Large-Scale Pattern Recognition): Generates PCA projections, evaluates Hopkins clustering statistics, provides K-Means elbow curves, and profiles data density.

📂 Supported Formats

No need to write separate loading code. khadee-eda automatically detects your dataset extension and uses optimized engines to parse it:

Format	Extensions	Parser
CSV / TSV	`.csv`, `.tsv`, `.txt`	Pandas optimized parser with latin-1 fallback
Excel	`.xlsx`, `.xls`, `.xlsm`, `.xlsb`	openpyxl / xlrd engine
JSON	`.json`	Standard and JSON-lines parsed dynamically
Parquet / Feather	`.parquet`, `.feather`	PyArrow engine
SQLite	`.db`, `.sqlite`, `.sqlite3`	Built-in SQLite connection reader
Pickle	`.pkl`, `.pickle`	Standard Python pickle serializer
Others	`.h5`, `.hdf5`, `.xml`, `.dta`, `.sas7bdat`, `.sav`	Supporting PyTables, XML, Stata, SAS, SPSS

💾 Package Footprint & Download Size

Unlike heavier packages that bundle thick C++ binaries, khadee-eda is designed to be incredibly lightweight and fast to download.

1. Download Size (Pip / UV)

Wheel Size (.whl): ~85 KB
Source Distribution (.tar.gz): ~90 KB
Package Source Size: ~170 KB (Clean, pure Python logic + minimal glassmorphism style assets)

2. Dependency Size

If your machine already has standard data science packages (like pandas, numpy, scipy) cached, the installation completes instantly (~85 KB download). If installing into a blank virtual environment, pip/uv will download the scientific stack:

Dependency	Purpose	Download Size (Approx.)
pandas	Data manipulation & structure	~12 - 15 MB
numpy	Array computations	~14 - 18 MB
scipy	Advanced statistics & tests	~35 - 40 MB
scikit-learn	Machine learning engines & PCA	~7 - 9 MB
plotly	Dynamic SVG visualizations	~7 - 8 MB
pyarrow	High-performance Parquet storage	~30 - 35 MB
openpyxl	Excel read/write compatibility	~2 - 3 MB
jinja2	HTML templating engine	~0.2 MB
Total Dependencies	Full Scientific Stack	~110 - 130 MB

⚙️ Selective Reports

Save compute time and reduce HTML sizes for large datasets by only rendering the sections or techniques you need:

from k_eda import ProfileReport

# Profile only Specific Sections
report = ProfileReport(
    "dataset.csv", 
    sections=["overview", "variables", "model_readiness"]
)

# Render only Specific Global Techniques
report = ProfileReport(
    "dataset.csv",
    techniques=["japan", "us"]
)

💎 Design & Visual Performance Excellence

Glassmorphism Dark Theme: Standard EDA reports often look like boring 2010 tables. khadee-eda features a high-end, dark glassmorphism dashboard with neon accents, dynamic hover states, and smooth CSS micro-animations.
Instant PDF Export: Features a beautiful floating "Download PDF" button that triggers browser printing. The custom media print styles automatically expand all hidden column cards, expand all tabs, hide navigational elements/dropdowns, and switch to a crisp ink-saving light template for a clean, professional corporate report.
WebGL Crash Mitigation: Rendering dozens of ScatterGL plots on a single page causes modern browsers to exceed their WebGL context limit, crash, and display blank charts. khadee-eda compiles Scatter plots to optimized vector SVG path strings, ensuring 100% chart rendering reliability without sacrificing interactive zoom or hover features.
Smart Dropdown Selectors: Instead of scrolling endlessly through dozens of columns, the report includes a dynamic select element to view one variable card at a time, instantly resizing the embedded Plotly chart to prevent layout distortions.
Copyable Preprocessing Recommender: When the library suggests cleaning operations (e.g., standardizing headers or imputing values), it displays a syntax-highlighted code block with a one-click copy button, generating context-aware code ready for your pipeline.

📦 Installation

To install khadee-eda in development mode locally:

git clone https://github.com/khadee/khadee-eda.git
cd k_eda
pip install -e .

To install directly using uv (recommended for extreme speed):

uv pip install -e .

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

This version

1.0.4

Jun 6, 2026

1.0.3

Jun 5, 2026

1.0.2

Jun 5, 2026

1.0.1

Jun 5, 2026

1.0.0

Jun 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khadee_eda-1.0.4.tar.gz (67.6 kB view details)

Uploaded Jun 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

khadee_eda-1.0.4-py3-none-any.whl (6.1 kB view details)

Uploaded Jun 6, 2026 Python 3

File details

Details for the file khadee_eda-1.0.4.tar.gz.

File metadata

Download URL: khadee_eda-1.0.4.tar.gz
Upload date: Jun 6, 2026
Size: 67.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for khadee_eda-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`65e70587806b467557c88000b229860a78a02a106d0425f84c45c56c211da48d`
MD5	`95134c729db12b81cf691497f82f4209`
BLAKE2b-256	`9ff98a439c93b4a0302b6de33b2bb68dc9e162dcbbf8723c27e740ad63a160db`

See more details on using hashes here.

File details

Details for the file khadee_eda-1.0.4-py3-none-any.whl.

File metadata

Download URL: khadee_eda-1.0.4-py3-none-any.whl
Upload date: Jun 6, 2026
Size: 6.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for khadee_eda-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5797deaa51159f563ff59bef9fd042e70f15e29e7984a17cd042e79afb72fdb5`
MD5	`19656014bdc5fb98b6cd6a2c39535801`
BLAKE2b-256	`2a76752312b7ae43481cd036326b75261d8d0fe5d46c5715513c628444149f5a`

See more details on using hashes here.

khadee-eda 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🔬 Khadee EDA — Deep Insights Data Profiling & Cleaning

⚡ Quick Start

1. Generating a Profiling Report

2. High-Performance Data Cleaning (`k_eda.clean`)

📊 10 Structured Analysis Sections

🌍 Global EDA Techniques

📂 Supported Formats

💾 Package Footprint & Download Size

1. Download Size (Pip / UV)

2. Dependency Size

⚙️ Selective Reports

💎 Design & Visual Performance Excellence

📦 Installation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

khadee-eda 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

🔬 Khadee EDA — Deep Insights Data Profiling & Cleaning

⚡ Quick Start

1. Generating a Profiling Report

2. High-Performance Data Cleaning (k_eda.clean)

📊 10 Structured Analysis Sections

🌍 Global EDA Techniques

📂 Supported Formats

💾 Package Footprint & Download Size

1. Download Size (Pip / UV)

2. Dependency Size

⚙️ Selective Reports

💎 Design & Visual Performance Excellence

📦 Installation

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2. High-Performance Data Cleaning (`k_eda.clean`)