Skip to main content

Ultra-lightweight micro EDA (exploratory data analysis) tool for small datasets

Project description

🚀 microeda

PyPI version Python Versions License GitHub Issues

microeda is an ultra-lightweight Python library for Exploratory Data Analysis (EDA) on small datasets (<10k rows). Get deep insights into your data instantly—detect column types, summarize statistics, spot missing values & outliers, and explore relationships.


✨ Features

  • Automatic Column Typing: numeric, categorical, boolean, datetime, text, ID.
  • Smart Summaries:
    • Numeric: mean, median, quartiles, missing values, outliers
    • Categorical: unique counts, top values
    • Datetime: min, max, range, missing
    • Text: token counts, top words
  • Missing Data Analysis:
    • Column-level missing percentages
    • Pairwise missing correlations
  • Outlier Detection:
    • IQR method
    • Z-score method
  • Pairwise Relationships:
    • Pearson for numeric
    • Mutual Information & Cramer's V for categorical
  • CLI Support: Generate Markdown or HTML reports
  • Dependency-light: Only pandas & numpy required, optional rich for pretty CLI
  • Semi-structured Data Support: Detect columns with JSON-like or list-like structures

📦 Installation

From PyPI:

pip install microeda

Or install from source:

git clone https://github.com/SaptarshiMondal123/microeda.git
cd microeda
pip install .

Usage

Python API

import pandas as pd
from microeda import analyze

df = pd.read_csv("your_data.csv")
report = analyze(df, name="My Dataset")

# Inspect your data
print(report["column_types"])
print(report["summaries"])
print(report["missingness"])
print(report["pairwise_hints"])

…will only give you raw dicts, no table.

If you want a readable table like the demo output, you should do:

from microeda import analyze_table

analyze_table(df, name="My Dataset")

CLI

Generate a Markdown report directly from the terminal:

microeda path/to/data.csv --style md --out report.md
microeda path/to/data.csv --style html --out report.html

Options:

--style: md (Markdown) or html (HTML)

--out: output file path

🌟 Example Output

Dataset: 100 rows x 5 cols

Column Summary:

Column Type Unique Missing Sample Stats
Age numeric 30 0 mean=29.8
Gender categorical 2 5 Male:55, Female:40
Name text 95 0 avg_tokens=2
Salary numeric 50 2 mean=55000
City text 10 0 avg_tokens=1

Contributing

Contributions are welcome! Feel free to submit issues or pull requests.

  • Fork the repo

  • Create a new branch (git checkout -b feature-name)

  • Make your changes

  • Run tests (pytest)

  • Submit a pull request

License

MIT License © 2025 Saptarshi Mondal

Links

GitHub: https://github.com/SaptarshiMondal123/microeda

PyPI: https://pypi.org/project/microeda/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microeda-0.4.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microeda-0.4.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file microeda-0.4.0.tar.gz.

File metadata

  • Download URL: microeda-0.4.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for microeda-0.4.0.tar.gz
Algorithm Hash digest
SHA256 3b424e932a2b6f9e4fed5d22ed45c6ed21fbe5deeec798276429243d4ae9055b
MD5 64ff5faecf1253223932a4b67d2fa3b1
BLAKE2b-256 bd98d1432e8d25a4ca0aa82e6d14137aabedaa12884a38220f6cf6870ae37e06

See more details on using hashes here.

File details

Details for the file microeda-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: microeda-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for microeda-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6410761cfbaddd6f12f03e10264fb3c85ef45653fe0fe841688cdf15bc41f626
MD5 563880f79fdace78bf2270c6c8c07abe
BLAKE2b-256 eff3b3fa454937242587c7b477fa0758b85b892047cf994dc439bf8640445db4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page