Skip to main content

๐Ÿ•ต๏ธโ€โ™‚๏ธ Grebes: lightweight, nature-inspired data auditor

Project description

grebes_banner_fixed

Grebes - Lightweight data sanity checks for busy devs & data teams. | Product Hunt

Grebes

๐Ÿ•ต๏ธโ€โ™‚๏ธ Grebes โ€” A lightweight, nature-inspired data quality auditor for structured datasets.


๐Ÿš€ Features

  • Fast, zero-config audit of CSV, Excel (.xls/.xlsx), JSON and JSON-Lines files
  • Rich CLI output with colored panels, sparklines, and warnings (powered by Rich)
  • Key data quality checks:
    • Missing value counts & percentages
    • Unique-value ratio & (optional) samples
    • Numeric statistics (mean, std, min, max) & IQR-based outlier counts
    • Inline histograms (sparklines) for numeric distributions
    • Date-range for datetime columns
    • Top-N frequencies for low-cardinality text/categorical columns
    • Mixed-type detection & duplicate-row warnings
  • Two modes:
    • CLI: grebes data.csv โ†’ instant terminal report
    • Python API: import GrebesAuditor into notebooks or scripts

๐Ÿ’พ Installation

# From PyPI (when published)
pip install grebes

# Or install your local copy in editable mode for development
git clone https://github.com/yourusername/grebes.git
cd grebes
pip install -e .

Requires Python โ‰ฅ 3.7 and the following packages: pandas, numpy, openpyxl (for Excel), and rich.


โšก CLI Usage

# Basic audit of a CSV file
grebes data.csv

# Audit an Excel sheet
grebes report.xlsx

# Audit a JSON-Lines file
grebes records.jsonl

# Show help / available options
grebes --help

Sample Output

Click to expand
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๐Ÿง  GREBES DIAGNOSTIC REPORT โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Rows: 1,000   Cols: 5   Mem: 180.21 KB                                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€โ”€ id โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Type    int64                                                                   โ”‚
โ”‚ Missing 0 (0.0%)                                                                โ”‚
โ”‚ Unique  1000                                                                    โ”‚
โ”‚ Stats   ฮผ=500.5,ฯƒ=288.8,min=1.0,max=1000.0,out=0                                โ”‚
โ”‚ Dist    โ–ˆโ–โ–‚โ–ƒโ–„โ–…โ–†โ–‡โ–ˆ
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€ amount โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Type    float64                                                                  โ”‚
โ”‚ Missing 0 (0.0%)                                                                 โ”‚
โ”‚ Stats   ฮผ=495.4,ฯƒ=289.2,min=14.6,max=999.7,out=0                                 โ”‚
โ”‚ Dist    โ–โ–ƒโ–„โ–…โ–‡โ–†โ–‡โ–…                                                               โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ€ฆ and so on for each column โ€ฆ


๐Ÿ“ฆ Python API

import pandas as pd
from grebes.auditor import GrebesAuditor

df = pd.read_csv("data.csv")
auditor = GrebesAuditor(df)
auditor.print_report()

๐Ÿ“ How It Works

  1. Reads your file (CSV, Excel, JSON(.l)) into a pandas.DataFrame.

  2. Computes column-wise metrics:

    • Missing values
    • Unique ratio (and optional sample values for low-cardinality columns)
    • Descriptive stats & outlier count for numerics
    • Date ranges for datetimes
    • Top frequencies for text/categorical
  3. Renders an interactive, colorized report with:

    • Panels per column
    • Sparklines for quick distribution glance
    • Warnings for mixed-type columns & duplicates
  4. Zero external calls โ€” all local, so safe on private data.


๐Ÿค Contributing

  1. Fork the repo
  2. Create a feature branch: git checkout -b feat/my-awesome-feature
  3. Commit your changes: git commit -m "Add feature X"
  4. Push to your branch: git push origin feat/my-awesome-feature
  5. Open a Pull Request

Please follow the existing code style and add tests for new functionality.


๐Ÿ“œ License

MIT License ยฉ Your Name See LICENSE for details.


Built with ๐Ÿ’™ and inspired by natureโ€™s graceโ€”light as air, sharp as a grebeโ€™s dive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grebes-0.1.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grebes-0.1.1-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file grebes-0.1.1.tar.gz.

File metadata

  • Download URL: grebes-0.1.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for grebes-0.1.1.tar.gz
Algorithm Hash digest
SHA256 62cea7f8923390763391cdf8fd1a58d0951755b5e6d234e11127331f6384443f
MD5 f6b5a4dd040abfe098dc21812fad413e
BLAKE2b-256 a383b8edfd9105e74c63d97a5fc6efe402c9d0e1e5937be832b73edf649ce53d

See more details on using hashes here.

File details

Details for the file grebes-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: grebes-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for grebes-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2571225014cdc21b82ad3d602d3f42d5747fc16e056eea8645f206c1c4138464
MD5 ad0b507433af8ed95a03cd44b6fe879e
BLAKE2b-256 bed4de1bd04125739510d3b03409e7c622549f77a28ad32a7d261705dfe4773c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page