Skip to main content

A lightweight Python library that streamlines EDA with fast, automated analysis to uncover data issues, validate assumptions, and support modeling decisions.

Project description

analytics-eda

Exploratory Data Analysis (EDA) is critical but time-consuming and often inconsistent across projects.

analytics-eda is a lightweight Python library that streamlines EDA with fast, automated analysis to uncover data issues, validate assumptions, and support modeling decisions.

Table of Contents

Installation

Requires: Python 3.11 or later

pip install analytics-eda

Quickstart

Here's how to generate a univariate numeric EDA report from a pandas series:

import json
import numpy as np
import pandas as pd

from analytics_eda.univariate.numeric.univariate_numeric_analysis import univariate_numeric_analysis

# Series to analyze
rng = np.random.default_rng(0)
data = rng.normal(loc=0.0, scale=1.0, size=15)
normal_15_series = pd.Series(data, name="normal_series")

# Analyze numeric series
report_file_path = univariate_numeric_analysis(
    normal_15_series
)

# Access json report
result = json.loads(report_file_path.read_text())
print(json.dumps(result, indent=2))

Features

Univariate Exploratory Data Analysis

Univariate EDA analyzes a single variable at a time to understand its distribution, detect anomalies, and assess data quality.
This step is essential for uncovering data issues before modeling and making informed preprocessing decisions.

Supports:

  • Numeric data:

    • Identifies missing data
    • Computes descriptive statistics (mean, median, std, skew, etc.)
    • Evaluates normality with tests and visualizations
    • Detects outliers using IQR, Z-score, and robust Z-score
    • Runs inferential tests (e.g. goodness-of-fit, variance checks)
  • Categorical data:

    • Summarizes frequency distributions
    • Flags rare or dominant categories
    • Assesses cardinality and missing values

Bivariate Exploratory Data Analysis

Bivariate EDA compares two variables to assess relationships and patterns across groups.
It helps detect important group-level differences and guides feature selection or encoding strategies.

Supports:

  • Numeric vs. Categorical:
    • Performs statistical tests (e.g. homogeneity of variance, distribution overlap)
    • Runs global hypothesis tests to compare group distributions
    • Estimates effect size between segments
    • Analyzes numeric patterns within categorical groups

API Reference

See our API Reference.md for details.

Contributing Guidelines

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is open source under the Apache License 2.0. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analytics_eda-0.1.18.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

analytics_eda-0.1.18-py3-none-any.whl (61.6 kB view details)

Uploaded Python 3

File details

Details for the file analytics_eda-0.1.18.tar.gz.

File metadata

  • Download URL: analytics_eda-0.1.18.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for analytics_eda-0.1.18.tar.gz
Algorithm Hash digest
SHA256 3eea5e169cbcd8f60804f90c83aad64f22f8b01eea63df97a6333d6cf00b5a6d
MD5 4a2fe4f28868cac372a77d5850b94056
BLAKE2b-256 ecbf5911b6996abb553987504db5e11f6d776ca710e804f01cc541ab4aecfc48

See more details on using hashes here.

File details

Details for the file analytics_eda-0.1.18-py3-none-any.whl.

File metadata

  • Download URL: analytics_eda-0.1.18-py3-none-any.whl
  • Upload date:
  • Size: 61.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.11

File hashes

Hashes for analytics_eda-0.1.18-py3-none-any.whl
Algorithm Hash digest
SHA256 f2c76dc0629ba4910ee0a8eaadbecd187d5c7cd38c6b6ab730ea382ecc07e391
MD5 620d0c742e23d88a16d6fc1866d91d52
BLAKE2b-256 557aa8254eb263c2af3c80748ad3668a14d1b8b1e624f1ba5b05ad2176be961c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page