Skip to main content

Automates Exploratory Data Analysis (EDA) for any pandas DataFrame

Project description

smarteda

Automate your Exploratory Data Analysis in one line of code.

smarteda is a Python package that eliminates repetitive EDA code. Instead of writing dozens of pandas lines every time you get a new dataset, smarteda analyzes it instantly, gives smart suggestions, and generates a full HTML report.


Installation

pip install smarteda

Quick Start

import pandas as pd
import smarteda

df = pd.read_csv("your_data.csv")

# Run everything at once
smarteda.analyze(df)

# Or generate a full HTML report
smarteda.report(df, output_file="report.html")

Functions

Function Description
smarteda.basic_eda(df) Head, tail, sample, shape, size, info, describe
smarteda.overview(df) Shape, memory, data types, constant columns, wrong type detection
smarteda.missing(df) Missing value counts, percentages, heatmap, fill suggestions
smarteda.duplicates(df) Count and show duplicate rows
smarteda.duplicates(df, drop=True) Drop duplicates and return clean DataFrame
smarteda.outliers(df) IQR, Z-score, and Isolation Forest outlier detection
smarteda.distributions(df) Skewness, kurtosis, transformation suggestions, histogram plots
smarteda.correlations(df) Pearson/Spearman/Kendall correlation, multicollinearity warnings
smarteda.categorical(df) Value counts, high cardinality detection, encoding suggestions
smarteda.timeseries(df) Auto datetime detection, trends, seasonality, gap detection
smarteda.suggestions(df) Smart recommendations + ML Readiness Score out of 100
smarteda.clean(df) Auto clean — returns a new cleaned DataFrame
smarteda.clean(df, inplace=True) Auto clean — modifies original DataFrame directly
smarteda.visualize(df) Auto charts for every column
smarteda.analyze(df) Runs ALL functions above in one call
smarteda.report(df) Generates a full standalone HTML report

Examples

Basic EDA

smarteda.basic_eda(df)        # default 5 rows
smarteda.basic_eda(df, n=10)  # show 10 rows

Missing Values

smarteda.missing(df)
# Output:
#        Count  Percentage
# age       21       10.24
# salary    15        7.32
# Suggestion: age → Fill with mean | salary → Fill with median

Outlier Detection

smarteda.outliers(df)
# Output:
# salary → 8 outliers (3.9%) using IQR
# score  → 1 outliers (0.49%) using Z-score
# Multi-dimensional (Isolation Forest) → 39 outliers (19.02%)

Smart Suggestions + ML Score

smarteda.suggestions(df)
# Output:
# ⚠️  Column 'salary' is highly skewed → apply log transformation
# ⚠️  'height' and 'weight' are 94% correlated → drop one
# ✅  No duplicates found
# 💡 ML Readiness Score: 87 / 100

Auto Clean

# Safe — keeps original df intact
clean_df = smarteda.clean(df)

# Modifies df directly
smarteda.clean(df, inplace=True)

HTML Report

smarteda.report(df, output_file="my_report.html")
# Opens in browser — no extra tools needed

What smarteda Detects Automatically

  • ✅ Missing values with fill strategy per column
  • ✅ Duplicate rows
  • ✅ Outliers using 3 methods (IQR, Z-score, Isolation Forest)
  • ✅ Skewed distributions with transformation suggestions
  • ✅ Multicollinearity between features
  • ✅ High cardinality categorical columns
  • ✅ Wrong data types (numbers stored as strings, dates as objects)
  • ✅ Constant columns (useless for ML)
  • ✅ Time series trends, seasonality, and gaps
  • ✅ ML Readiness Score out of 100

Dependencies

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scipy
  • scikit-learn
  • jinja2
  • missingno

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smarteda-0.1.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smarteda-0.1.0-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file smarteda-0.1.0.tar.gz.

File metadata

  • Download URL: smarteda-0.1.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for smarteda-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9d5774a3f9ced0a578c31f25117fbd990ec65ae869a67e812c8c1ecbbad0d052
MD5 974b8c8418d8b6dcb044140c76704128
BLAKE2b-256 23cb5e4b88e20258cb5653ab8c58d53a98da87a8b070e096ba8256d057a1afc1

See more details on using hashes here.

File details

Details for the file smarteda-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smarteda-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for smarteda-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04e13c6956e788def73ea89af66758177277e3ede5b1213fab6265df2909fe76
MD5 2aa70c43fa51bb5bcb037c1eee76f5cb
BLAKE2b-256 e31abbf21675e00d5d42617f23cbd5c0e5b32203bf4570118769e45061effa2b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page