A lightweight Python library that streamlines EDA with fast, automated analysis to uncover data issues, validate assumptions, and support modeling decisions.
Project description
analytics-eda
Exploratory Data Analysis (EDA) is critical but time-consuming and often inconsistent across projects.
analytics-eda
is a lightweight Python library that streamlines EDA with fast, automated analysis to uncover data issues, validate assumptions, and support modeling decisions.
Table of Contents
Installation
Requires: Python 3.11 or later
pip install analytics-eda
Quickstart
Here's how to generate a univariate numeric EDA report from a pandas series:
import json
import numpy as np
import pandas as pd
from analytics_eda.univariate.numeric.univariate_numeric_analysis import univariate_numeric_analysis
# Series to analyze
rng = np.random.default_rng(0)
data = rng.normal(loc=0.0, scale=1.0, size=15)
normal_15_series = pd.Series(data, name="normal_series")
# Analyze numeric series
report_file_path = univariate_numeric_analysis(
normal_15_series
)
# Access json report
result = json.loads(report_file_path.read_text())
print(json.dumps(result, indent=2))
Features
Univariate Exploratory Data Analysis
Univariate EDA analyzes a single variable at a time to understand its distribution, detect anomalies, and assess data quality.
This step is essential for uncovering data issues before modeling and making informed preprocessing decisions.
Supports:
-
Numeric data:
- Identifies missing data
- Computes descriptive statistics (mean, median, std, skew, etc.)
- Evaluates normality with tests and visualizations
- Detects outliers using IQR, Z-score, and robust Z-score
- Runs inferential tests (e.g. goodness-of-fit, variance checks)
-
Categorical data:
- Summarizes frequency distributions
- Flags rare or dominant categories
- Assesses cardinality and missing values
Bivariate Exploratory Data Analysis
Bivariate EDA compares two variables to assess relationships and patterns across groups.
It helps detect important group-level differences and guides feature selection or encoding strategies.
Supports:
- Numeric vs. Categorical:
- Performs statistical tests (e.g. homogeneity of variance, distribution overlap)
- Runs global hypothesis tests to compare group distributions
- Estimates effect size between segments
- Analyzes numeric patterns within categorical groups
API Reference
See our API Reference.md for details.
Contributing Guidelines
We welcome contributions! Please see CONTRIBUTING.md for details.
License
This project is open source under the Apache License 2.0. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file analytics_eda-0.1.18.tar.gz
.
File metadata
- Download URL: analytics_eda-0.1.18.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
3eea5e169cbcd8f60804f90c83aad64f22f8b01eea63df97a6333d6cf00b5a6d
|
|
MD5 |
4a2fe4f28868cac372a77d5850b94056
|
|
BLAKE2b-256 |
ecbf5911b6996abb553987504db5e11f6d776ca710e804f01cc541ab4aecfc48
|
File details
Details for the file analytics_eda-0.1.18-py3-none-any.whl
.
File metadata
- Download URL: analytics_eda-0.1.18-py3-none-any.whl
- Upload date:
- Size: 61.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
f2c76dc0629ba4910ee0a8eaadbecd187d5c7cd38c6b6ab730ea382ecc07e391
|
|
MD5 |
620d0c742e23d88a16d6fc1866d91d52
|
|
BLAKE2b-256 |
557aa8254eb263c2af3c80748ad3668a14d1b8b1e624f1ba5b05ad2176be961c
|