Skip to main content

Automates Exploratory Data Analysis (EDA) for any pandas DataFrame

Project description

smarteda

Automate your Exploratory Data Analysis in one line of code.

smarteda is a Python package that eliminates repetitive EDA code. Instead of writing dozens of pandas lines every time you get a new dataset, smarteda analyzes it instantly, gives smart suggestions, and generates a full HTML report.


Installation

pip install smarteda

Quick Start

import pandas as pd
import smarteda

df = pd.read_csv("your_data.csv")

# Run everything at once
smarteda.analyze(df)

# Or generate a full HTML report
smarteda.report(df, output_file="report.html")

Functions

Function Description
smarteda.basic_eda(df) Head, tail, sample, shape, size, info, describe
smarteda.overview(df) Shape, memory, data types, constant columns, wrong type detection
smarteda.missing(df) Missing value counts, percentages, heatmap, fill suggestions
smarteda.duplicates(df) Count and show duplicate rows
smarteda.duplicates(df, drop=True) Drop duplicates and return clean DataFrame
smarteda.outliers(df) IQR, Z-score, and Isolation Forest outlier detection
smarteda.distributions(df) Skewness, kurtosis, transformation suggestions, histogram plots
smarteda.correlations(df) Pearson/Spearman/Kendall correlation, multicollinearity warnings
smarteda.categorical(df) Value counts, high cardinality detection, encoding suggestions
smarteda.timeseries(df) Auto datetime detection, trends, seasonality, gap detection
smarteda.suggestions(df) Smart recommendations + ML Readiness Score out of 100
smarteda.clean(df) Auto clean — returns a new cleaned DataFrame
smarteda.clean(df, inplace=True) Auto clean — modifies original DataFrame directly
smarteda.visualize(df) Auto charts for every column
smarteda.analyze(df) Runs ALL functions above in one call
smarteda.report(df) Generates a full standalone HTML report

Examples

Basic EDA

smarteda.basic_eda(df)        # default 5 rows
smarteda.basic_eda(df, n=10)  # show 10 rows

Missing Values

smarteda.missing(df)
# Output:
#        Count  Percentage
# age       21       10.24
# salary    15        7.32
# Suggestion: age → Fill with mean | salary → Fill with median

Outlier Detection

smarteda.outliers(df)
# Output:
# salary → 8 outliers (3.9%) using IQR
# score  → 1 outliers (0.49%) using Z-score
# Multi-dimensional (Isolation Forest) → 39 outliers (19.02%)

Smart Suggestions + ML Score

smarteda.suggestions(df)
# Output:
# ⚠️  Column 'salary' is highly skewed → apply log transformation
# ⚠️  'height' and 'weight' are 94% correlated → drop one
# ✅  No duplicates found
# 💡 ML Readiness Score: 87 / 100

Auto Clean

# Safe — keeps original df intact
clean_df = smarteda.clean(df)

# Modifies df directly
smarteda.clean(df, inplace=True)

HTML Report

smarteda.report(df, output_file="my_report.html")
# Opens in browser — no extra tools needed

What smarteda Detects Automatically

  • ✅ Missing values with fill strategy per column
  • ✅ Duplicate rows
  • ✅ Outliers using 3 methods (IQR, Z-score, Isolation Forest)
  • ✅ Skewed distributions with transformation suggestions
  • ✅ Multicollinearity between features
  • ✅ High cardinality categorical columns
  • ✅ Wrong data types (numbers stored as strings, dates as objects)
  • ✅ Constant columns (useless for ML)
  • ✅ Time series trends, seasonality, and gaps
  • ✅ ML Readiness Score out of 100

Dependencies

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scipy
  • scikit-learn
  • jinja2
  • missingno

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smarteda-0.1.1.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smarteda-0.1.1-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file smarteda-0.1.1.tar.gz.

File metadata

  • Download URL: smarteda-0.1.1.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for smarteda-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5b5ed904c6bfc4b45881a646dbdb737ed824f8e4757f56e08ca28653415ac16f
MD5 c89509921e722ea95fd446622f0f85ba
BLAKE2b-256 cc25d0716d86437dedc9726d78e2876dd5e6f053181556a5cc06d4f99d3efb57

See more details on using hashes here.

File details

Details for the file smarteda-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: smarteda-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for smarteda-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d26e258167399d3c0bd29019c70d564500e596cf87227bbb5cd1895bf7f15e7f
MD5 e0aedb4b58a8446d217f987125f51023
BLAKE2b-256 0cfcf94c23808d743f9a82e11cb4cad5f47c10f4fb7f176784fa8ae64034da8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page