Skip to main content

๐Ÿš€ One-line data exploration for developers & data scientists

Project description

๐Ÿš€ SpeedyEDA - Production-Ready Data Exploration

PyPI version GitHub stars License: MIT

Professional-grade exploratory data analysis in one command.

Stop writing boilerplate! SpeedyEDA gives you complete exploratory data analysis in seconds โ€” now with advanced statistical methods, data quality alerts, and interactive HTML reports that rival industry-standard tools.


๐Ÿ’ก Love this project? โญ Star us on GitHub!

Your star helps others discover SpeedyEDA and motivates us to keep improving! ๐Ÿ™


๐Ÿ†• What's New in v0.2.0

โœจ Advanced Statistical Analysis

  • Multiple correlation methods (Pearson, Spearman, Kendall)
  • Enhanced outlier detection (IQR + Z-score)
  • Detailed quantile statistics (5th, 95th percentiles)
  • Normality tests (Shapiro-Wilk)

๐Ÿšจ Automated Data Quality Alerts

  • Multicollinearity detection
  • High cardinality warnings
  • Duplicate row detection
  • Class imbalance analysis
  • Excessive missing value alerts
  • Mixed data type detection

๐Ÿ“Š Interactive HTML Reports

  • Beautiful Plotly visualizations
  • Click-to-zoom charts
  • Standalone HTML files
  • Professional styling

๐Ÿ“Š SpeedyEDA vs The Competition

Feature SpeedyEDA ydata-profiling Sweetviz D-Tale
Basic Statistics โœ… โœ… โœ… โœ…
Multiple Correlations โœ… Pearson/Spearman/Kendall โœ… โœ… โœ…
Outlier Detection โœ… IQR + Z-score โœ… โœ… โš ๏ธ
Data Quality Alerts โœ… 8 types โœ… โš ๏ธ Limited โŒ
Interactive HTML โœ… Plotly โœ… โœ… โœ… Flask
Dataset Comparison ๐Ÿ”œ v0.3.0 โŒ โœ… โŒ
Target Analysis ๐Ÿ”œ v0.3.0 โœ… โœ… โš ๏ธ
Speed (10K rows) โšก <1s ~10s ~5s ~3s
Installation Size ๐Ÿ“ฆ ~100MB ~500MB ~200MB ~300MB
One-Line CLI โœ… โŒ โŒ โŒ
Fun Mode โœ… ๐ŸŽ‰ โŒ โŒ โŒ

Bottom Line: SpeedyEDA combines the speed of simple tools with the features of professional ones, plus a delightful UX.


โœจ Core Features

  • ๐Ÿ“Š Automatic Statistics - Mean, median, mode, std, skewness, kurtosis, detailed quantiles
  • ๐Ÿ” Advanced Missing Value Analysis - Patterns, correlations, recommendations
  • ๐Ÿ“ˆ Auto Visualizations - Histograms, boxplots, correlation heatmaps (static + interactive)
  • ๐Ÿ”— Multiple Correlation Methods - Pearson (linear), Spearman (monotonic), Kendall (ordinal)
  • ๐ŸŽฏ Smart Outlier Detection - IQR method + Z-score with configurable thresholds
  • ๐Ÿšจ Data Quality Alerts - Multicollinearity, high cardinality, duplicates, class imbalance, and more
  • ๐ŸŽจ Beautiful Terminal Output - Colorful, emoji-rich displays using rich
  • ๐Ÿ“„ Interactive HTML Reports - Professional Plotly-based reports with click-to-zoom
  • ๐Ÿ”ง Smart Presets - Pre-configured for ecommerce, surveys, finance
  • ๐Ÿ”Œ Plugin System - Extend with custom visualizations and metrics
  • ๐Ÿค Interactive Mode - Guided column and plot selection
  • ๐Ÿ“ฆ Batch Processing - Analyze multiple datasets at once

๐Ÿš€ Quick Start

Installation

pip install speedyeda

Basic Usage

# Full analysis with data quality alerts
fasteda sales.csv --fun

# Generate interactive HTML report
fasteda data.csv --html report.html

# Use preset with plots and HTML
fasteda products.csv --preset ecommerce --plots --html ecommerce_report.html

# Interactive mode
fasteda survey.xlsx --interactive

# Batch processing with HTML reports
fasteda file1.csv file2.csv file3.csv --batch --html

# Disable advanced features for speed
fasteda huge_dataset.csv --no-advanced

Python API

import pandas as pd
from fasteda import analyze, save_report

df = pd.read_csv("sales.csv")

# Full analysis with advanced features
results = analyze(df, fun=True, advanced=True)

# Check data quality alerts
if results['quality_alerts']:
    for alert in results['quality_alerts']:
        print(alert.message)

# Multiple correlation methods
correlations = results['advanced_correlations']
print(correlations['spearman'])  # Spearman correlation matrix

# Outlier detection
outliers = results['outliers']
for col, info in outliers.items():
    print(f"{col}: {info['count']} outliers ({info['percentage']:.1f}%)")

# Save detailed report
save_report(results, "sales_report.json")

๐Ÿ“‹ CLI Options

Flag Description
--fun ๐ŸŽ‰ Emojis and colorful output (highly recommended!)
--html <file> ๐Ÿ“Š NEW! Generate interactive HTML report
--no-advanced โšก Disable advanced features for faster processing
--summary ๐Ÿ“ Plain text summary with insights
--plots ๐Ÿ“Š Generate and save static visualizations
--save <file> ๐Ÿ’พ Export report (JSON/TXT)
--interactive ๐Ÿค Interactive column/plot selection
--preset <name> ๐ŸŽฏ Use preset (ecommerce, survey, finance)
--columns <cols> ๐ŸŽฏ Analyze specific columns only
--batch ๐Ÿ“ฆ Process multiple files
--quiet ๐Ÿคซ Suppress terminal output

๐ŸŽฏ Smart Presets

SpeedyEDA includes built-in presets tailored for common scenarios:

  • ๐Ÿ“ฆ ecommerce - Product analysis, sales trends, customer behavior
  • ๐Ÿ“‹ survey - Response distributions, sentiment analysis, demographics
  • ๐Ÿ’ฐ finance - Time series, correlations, risk metrics
  • ๐Ÿ”ง general - Comprehensive all-purpose exploration
fasteda sales.csv --preset ecommerce --plots --fun

๐Ÿ”Œ Extend with Plugins

Build custom analysis functions:

from fasteda.plugins import register_plugin

@register_plugin("outlier_detection")
def detect_outliers(df, threshold=1.5):
    # Your custom analysis
    return results

๐Ÿ“ฆ Supported Formats

  • ๐Ÿ“„ CSV (.csv)
  • ๐Ÿ“Š Excel (.xlsx, .xls)
  • ๐Ÿ—‚๏ธ JSON (.json)
  • โšก Parquet (.parquet)

๐ŸŒŸ Why SpeedyEDA?

Before SpeedyEDA:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("data.csv")
print(df.describe())
print(df.info())
print(df.isnull().sum())
plt.figure(figsize=(10,6))
# ... 20+ more lines of boilerplate ...

With SpeedyEDA:

fasteda data.csv --fun

โœจ One command. Complete analysis. Beautiful output.

๐Ÿค Contributing

We'd love your help making SpeedyEDA even better!

๐Ÿ“„ License

MIT License - see LICENSE file for details.


Made with โค๏ธ by Dawaman

If SpeedyEDA saves you time, โญ star the repo to show your support!

๐Ÿ› Report Bug ยท ๐Ÿ’ก Request Feature ยท ๐Ÿ“– Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speedyeda-0.2.0.tar.gz (30.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speedyeda-0.2.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file speedyeda-0.2.0.tar.gz.

File metadata

  • Download URL: speedyeda-0.2.0.tar.gz
  • Upload date:
  • Size: 30.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for speedyeda-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1b75fbd5d36f7b299fd41759118983bac6d546641caf31cbd5143031349f8866
MD5 321b7daceefe9fe95afb83f80f9022aa
BLAKE2b-256 78aa28d6c0655b8da58cf5c00c472b92324183182756daab05b11ffbc6875f65

See more details on using hashes here.

File details

Details for the file speedyeda-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: speedyeda-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for speedyeda-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24b92654a0ff181f20c66ab1c1d18e05b18dc3bd35a411f4677190553fc76403
MD5 f37f5c6567216c4c0bbf5bb0a10da28b
BLAKE2b-256 3db9076f51807790c6e05564aa1b14977e7c1ea63a55c11ddf5aa1769bce8044

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page