๐ One-line data exploration for developers & data scientists
Project description
๐ SpeedyEDA - Production-Ready Data Exploration
Professional-grade exploratory data analysis in one command.
Stop writing boilerplate! SpeedyEDA gives you complete exploratory data analysis in seconds โ now with advanced statistical methods, data quality alerts, and interactive HTML reports that rival industry-standard tools.
๐ก Love this project? โญ Star us on GitHub!
Your star helps others discover SpeedyEDA and motivates us to keep improving! ๐
๐ What's New in v0.2.0
โจ Advanced Statistical Analysis
- Multiple correlation methods (Pearson, Spearman, Kendall)
- Enhanced outlier detection (IQR + Z-score)
- Detailed quantile statistics (5th, 95th percentiles)
- Normality tests (Shapiro-Wilk)
๐จ Automated Data Quality Alerts
- Multicollinearity detection
- High cardinality warnings
- Duplicate row detection
- Class imbalance analysis
- Excessive missing value alerts
- Mixed data type detection
๐ Interactive HTML Reports
- Beautiful Plotly visualizations
- Click-to-zoom charts
- Standalone HTML files
- Professional styling
๐ SpeedyEDA vs The Competition
| Feature | SpeedyEDA | ydata-profiling | Sweetviz | D-Tale |
|---|---|---|---|---|
| Basic Statistics | โ | โ | โ | โ |
| Multiple Correlations | โ Pearson/Spearman/Kendall | โ | โ | โ |
| Outlier Detection | โ IQR + Z-score | โ | โ | โ ๏ธ |
| Data Quality Alerts | โ 8 types | โ | โ ๏ธ Limited | โ |
| Interactive HTML | โ Plotly | โ | โ | โ Flask |
| Dataset Comparison | ๐ v0.3.0 | โ | โ | โ |
| Target Analysis | ๐ v0.3.0 | โ | โ | โ ๏ธ |
| Speed (10K rows) | โก <1s | ~10s | ~5s | ~3s |
| Installation Size | ๐ฆ ~100MB | ~500MB | ~200MB | ~300MB |
| One-Line CLI | โ | โ | โ | โ |
| Fun Mode | โ ๐ | โ | โ | โ |
Bottom Line: SpeedyEDA combines the speed of simple tools with the features of professional ones, plus a delightful UX.
โจ Core Features
- ๐ Automatic Statistics - Mean, median, mode, std, skewness, kurtosis, detailed quantiles
- ๐ Advanced Missing Value Analysis - Patterns, correlations, recommendations
- ๐ Auto Visualizations - Histograms, boxplots, correlation heatmaps (static + interactive)
- ๐ Multiple Correlation Methods - Pearson (linear), Spearman (monotonic), Kendall (ordinal)
- ๐ฏ Smart Outlier Detection - IQR method + Z-score with configurable thresholds
- ๐จ Data Quality Alerts - Multicollinearity, high cardinality, duplicates, class imbalance, and more
- ๐จ Beautiful Terminal Output - Colorful, emoji-rich displays using
rich - ๐ Interactive HTML Reports - Professional Plotly-based reports with click-to-zoom
- ๐ง Smart Presets - Pre-configured for ecommerce, surveys, finance
- ๐ Plugin System - Extend with custom visualizations and metrics
- ๐ค Interactive Mode - Guided column and plot selection
- ๐ฆ Batch Processing - Analyze multiple datasets at once
๐ Quick Start
Installation
pip install speedyeda
Basic Usage
# Full analysis with data quality alerts
fasteda sales.csv --fun
# Generate interactive HTML report
fasteda data.csv --html report.html
# Use preset with plots and HTML
fasteda products.csv --preset ecommerce --plots --html ecommerce_report.html
# Interactive mode
fasteda survey.xlsx --interactive
# Batch processing with HTML reports
fasteda file1.csv file2.csv file3.csv --batch --html
# Disable advanced features for speed
fasteda huge_dataset.csv --no-advanced
Python API
import pandas as pd
from fasteda import analyze, save_report
df = pd.read_csv("sales.csv")
# Full analysis with advanced features
results = analyze(df, fun=True, advanced=True)
# Check data quality alerts
if results['quality_alerts']:
for alert in results['quality_alerts']:
print(alert.message)
# Multiple correlation methods
correlations = results['advanced_correlations']
print(correlations['spearman']) # Spearman correlation matrix
# Outlier detection
outliers = results['outliers']
for col, info in outliers.items():
print(f"{col}: {info['count']} outliers ({info['percentage']:.1f}%)")
# Save detailed report
save_report(results, "sales_report.json")
๐ CLI Options
| Flag | Description |
|---|---|
--fun |
๐ Emojis and colorful output (highly recommended!) |
--html <file> |
๐ NEW! Generate interactive HTML report |
--no-advanced |
โก Disable advanced features for faster processing |
--summary |
๐ Plain text summary with insights |
--plots |
๐ Generate and save static visualizations |
--save <file> |
๐พ Export report (JSON/TXT) |
--interactive |
๐ค Interactive column/plot selection |
--preset <name> |
๐ฏ Use preset (ecommerce, survey, finance) |
--columns <cols> |
๐ฏ Analyze specific columns only |
--batch |
๐ฆ Process multiple files |
--quiet |
๐คซ Suppress terminal output |
๐ฏ Smart Presets
SpeedyEDA includes built-in presets tailored for common scenarios:
- ๐ฆ ecommerce - Product analysis, sales trends, customer behavior
- ๐ survey - Response distributions, sentiment analysis, demographics
- ๐ฐ finance - Time series, correlations, risk metrics
- ๐ง general - Comprehensive all-purpose exploration
fasteda sales.csv --preset ecommerce --plots --fun
๐ Extend with Plugins
Build custom analysis functions:
from fasteda.plugins import register_plugin
@register_plugin("outlier_detection")
def detect_outliers(df, threshold=1.5):
# Your custom analysis
return results
๐ฆ Supported Formats
- ๐ CSV (
.csv) - ๐ Excel (
.xlsx,.xls) - ๐๏ธ JSON (
.json) - โก Parquet (
.parquet)
๐ Why SpeedyEDA?
Before SpeedyEDA:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("data.csv")
print(df.describe())
print(df.info())
print(df.isnull().sum())
plt.figure(figsize=(10,6))
# ... 20+ more lines of boilerplate ...
With SpeedyEDA:
fasteda data.csv --fun
โจ One command. Complete analysis. Beautiful output.
๐ค Contributing
We'd love your help making SpeedyEDA even better!
- ๐ Found a bug? Open an issue
- ๐ก Have an idea? Start a discussion
- ๐จ Want to contribute? Submit a PR
- โญ Love SpeedyEDA? Star the repo!
๐ License
MIT License - see LICENSE file for details.
Made with โค๏ธ by Dawaman
If SpeedyEDA saves you time, โญ star the repo to show your support!
๐ Report Bug ยท ๐ก Request Feature ยท ๐ Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speedyeda-0.2.0.tar.gz.
File metadata
- Download URL: speedyeda-0.2.0.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b75fbd5d36f7b299fd41759118983bac6d546641caf31cbd5143031349f8866
|
|
| MD5 |
321b7daceefe9fe95afb83f80f9022aa
|
|
| BLAKE2b-256 |
78aa28d6c0655b8da58cf5c00c472b92324183182756daab05b11ffbc6875f65
|
File details
Details for the file speedyeda-0.2.0-py3-none-any.whl.
File metadata
- Download URL: speedyeda-0.2.0-py3-none-any.whl
- Upload date:
- Size: 27.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24b92654a0ff181f20c66ab1c1d18e05b18dc3bd35a411f4677190553fc76403
|
|
| MD5 |
f37f5c6567216c4c0bbf5bb0a10da28b
|
|
| BLAKE2b-256 |
3db9076f51807790c6e05564aa1b14977e7c1ea63a55c11ddf5aa1769bce8044
|