Automatic Exploratory Data Analysis, Cleaning, Validation, Visualization, and Smart Insights on ANY dataset.
Project description
AutoEDA
A production-ready Python package that performs automatic Exploratory Data Analysis (EDA), Data Cleaning, Data Validation, Visualization, and Smart Insights on ANY dataset.
Features
- Multi-Format Support: Analyze
.csv,.xlsx,.xls, and.jsoneffortlessly. - Dataset Health Score: Instantly see a 0-100 score with category (Excellent/Good/Fair/Poor).
- Chat With Dataset: Ask natural language questions about your data interactively.
- Executive Summary: Get a concise, manager-friendly overview of your dataset.
- Streamlit Dashboard: Launch an interactive web interface with
autoeda dashboard. - PDF Report Generation: Professional PDF report with tables, charts, and insights.
- Smart Insights Engine: Generates 10+ business-style data observations automatically.
- Cleaning Recommendations: Actionable, numbered cleaning steps for your specific data.
- Large Dataset Mode: Smartly samples and optimizes memory for files >100MB or >100k rows.
- Cleaned Data Export: Export cleaned data straight from the CLI.
- Performance Optimized: Optional Polars backend for lightning-fast loading of large datasets.
- Visualizations: Automatically generates relevant charts using Matplotlib and Seaborn.
- Rich Terminal UI: Beautiful, organized CLI reports.
Installation
# Basic installation
pip install .
# Installation with Polars backend for large datasets
pip install .[fast]
Usage
CLI
# Complete analysis
autoeda data.csv
autoeda data.xlsx
autoeda data.json
# Clean data and export to cleaned_data.csv
autoeda data.csv --clean
# Generate a PDF report
autoeda data.csv --report
# Executive summary
autoeda data.csv --summary
# Chat with your dataset
autoeda data.csv --ask
# Only visualizations
autoeda data.csv --visualize
# Run everything
autoeda data.csv --all
# Launch Interactive Dashboard
autoeda dashboard
Python API
from autoeda.cli import analyze
# Complete analysis
df = analyze("data.csv")
# Executive summary only
analyze("data.csv", summary=True)
# Clean data
df_clean = analyze("data.csv", clean=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
data_autoeda-2.0.0.tar.gz
(16.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_autoeda-2.0.0.tar.gz.
File metadata
- Download URL: data_autoeda-2.0.0.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b3670ea793a4af2a6a45cf2ea326c7fbdbe31b394875536e758f20cd213415f
|
|
| MD5 |
3ae5a731f429817f85465969c1dd8137
|
|
| BLAKE2b-256 |
8937aa3eae35fcd39a3488407aa69b24e93537683eb9ea2ec1ddbfb8c3b6b25f
|
File details
Details for the file data_autoeda-2.0.0-py3-none-any.whl.
File metadata
- Download URL: data_autoeda-2.0.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67242d798fedec4cfe29c49090130bb79a25d520394bea61919d7169d1e38a97
|
|
| MD5 |
75402f22a184af43372ac8e11960ee8c
|
|
| BLAKE2b-256 |
f737f5911a73416eaafe09a386634066771896d785e355e5eb9b74a260f1ac4f
|