Skip to main content

Ultra-lightweight micro EDA (exploratory data analysis) tool for small datasets

Project description

🚀 microeda

PyPI version Python Versions License GitHub Issues

microeda is an ultra-lightweight Python library for Exploratory Data Analysis (EDA) on small datasets (<10k rows). Get deep insights into your data instantly—detect column types, summarize statistics, spot missing values & outliers, and explore relationships.


✨ Features

  • Automatic Column Typing: numeric, categorical, boolean, datetime, text, ID.
  • Smart Summaries:
    • Numeric: mean, median, quartiles, missing values, outliers
    • Categorical: unique counts, top values
    • Datetime: min, max, range, missing
    • Text: token counts, top words
  • Missing Data Analysis:
    • Column-level missing percentages
    • Pairwise missing correlations
  • Outlier Detection:
    • IQR method
    • Z-score method
  • Pairwise Relationships:
    • Pearson for numeric
    • Mutual Information & Cramer's V for categorical
  • CLI Support: Generate Markdown or HTML reports
  • Dependency-light: Only pandas & numpy required, optional rich for pretty CLI
  • Semi-structured Data Support: Detect columns with JSON-like or list-like structures

📦 Installation

From PyPI:

pip install microeda

Or install from source:

git clone https://github.com/SaptarshiMondal123/microeda.git
cd microeda
pip install .

Usage

Python API

import pandas as pd
from microeda import analyze

df = pd.read_csv("your_data.csv")
report = analyze(df, name="My Dataset")

# Inspect your data
print(report["column_types"])
print(report["summaries"])
print(report["missingness"])
print(report["pairwise_hints"])

…will only give you raw dicts, no table.

If you want a readable table like the demo output, you should do:

from microeda import analyze_table

analyze_table(df, name="My Dataset")

CLI

Generate a Markdown report directly from the terminal:

microeda path/to/data.csv --style md --out report.md
microeda path/to/data.csv --style html --out report.html

Options:

--style: md (Markdown) or html (HTML)

--out: output file path

🌟 Example Output

Dataset: 100 rows x 5 cols

Column Summary:

Column Type Unique Missing Sample Stats
Age numeric 30 0 mean=29.8
Gender categorical 2 5 Male:55, Female:40
Name text 95 0 avg_tokens=2
Salary numeric 50 2 mean=55000
City text 10 0 avg_tokens=1

Contributing

Contributions are welcome! Feel free to submit issues or pull requests.

  • Fork the repo

  • Create a new branch (git checkout -b feature-name)

  • Make your changes

  • Run tests (pytest)

  • Submit a pull request

License

MIT License © 2025 Saptarshi Mondal

Links

GitHub: https://github.com/SaptarshiMondal123/microeda

PyPI: https://pypi.org/project/microeda/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microeda-0.5.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

microeda-0.5.0-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file microeda-0.5.0.tar.gz.

File metadata

  • Download URL: microeda-0.5.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for microeda-0.5.0.tar.gz
Algorithm Hash digest
SHA256 51de8750dc06765f913b5778dc5a8a193d6eb4a9b7a1bc0bf933efe2a0c1938e
MD5 624d05cc4dcce1b6f5c554e17c1ff478
BLAKE2b-256 511644e338793c471a2ea306f601e62253eb24cb0af3cdc09ece0f0a1978804a

See more details on using hashes here.

File details

Details for the file microeda-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: microeda-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for microeda-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38f6bee576d7a38ce8d255fbb1699a5814268f597dc93660c82e482e0617f6f0
MD5 6d9de49a1bff2d4828b3700b2e8702f3
BLAKE2b-256 3993d0b422d39bad73e8f19bc581c17d4f92718b10cf74e6e3daab60859cf76d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page