Skip to main content

๐Ÿ•ต๏ธโ€โ™‚๏ธ Grebes: lightweight, nature-inspired data auditor

Project description

Grebes

๐Ÿ•ต๏ธโ€โ™‚๏ธ Grebes โ€” A lightweight, nature-inspired data quality auditor for structured datasets.


๐Ÿš€ Features

  • Fast, zero-config audit of CSV, Excel (.xls/.xlsx), JSON and JSON-Lines files
  • Rich CLI output with colored panels, sparklines, and warnings (powered by Rich)
  • Key data quality checks:
    • Missing value counts & percentages
    • Unique-value ratio & (optional) samples
    • Numeric statistics (mean, std, min, max) & IQR-based outlier counts
    • Inline histograms (sparklines) for numeric distributions
    • Date-range for datetime columns
    • Top-N frequencies for low-cardinality text/categorical columns
    • Mixed-type detection & duplicate-row warnings
  • Two modes:
    • CLI: grebes data.csv โ†’ instant terminal report
    • Python API: import GrebesAuditor into notebooks or scripts

๐Ÿ’พ Installation

# From PyPI (when published)
pip install grebes

# Or install your local copy in editable mode for development
git clone https://github.com/yourusername/grebes.git
cd grebes
pip install -e .

Requires Python โ‰ฅ 3.7 and the following packages: pandas, numpy, openpyxl (for Excel), and rich.


โšก CLI Usage

# Basic audit of a CSV file
grebes data.csv

# Audit an Excel sheet
grebes report.xlsx

# Audit a JSON-Lines file
grebes records.jsonl

# Show help / available options
grebes --help

Sample Output

Click to expand
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ๐Ÿง  GREBES DIAGNOSTIC REPORT โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Rows: 1,000   Cols: 5   Mem: 180.21 KB                                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€โ”€ id โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Type    int64                                                                   โ”‚
โ”‚ Missing 0 (0.0%)                                                                โ”‚
โ”‚ Unique  1000                                                                    โ”‚
โ”‚ Stats   ฮผ=500.5,ฯƒ=288.8,min=1.0,max=1000.0,out=0                                โ”‚
โ”‚ Dist    โ–ˆโ–โ–‚โ–ƒโ–„โ–…โ–†โ–‡โ–ˆ
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ•ญโ”€โ”€โ”€ amount โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Type    float64                                                                  โ”‚
โ”‚ Missing 0 (0.0%)                                                                 โ”‚
โ”‚ Stats   ฮผ=495.4,ฯƒ=289.2,min=14.6,max=999.7,out=0                                 โ”‚
โ”‚ Dist    โ–โ–ƒโ–„โ–…โ–‡โ–†โ–‡โ–…                                                               โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

โ€ฆ and so on for each column โ€ฆ

</details>

---

## ๐Ÿ“ฆ Python API

```python
import pandas as pd
from grebes.auditor import GrebesAuditor

df = pd.read_csv("data.csv")
auditor = GrebesAuditor(df)
auditor.print_report()

๐Ÿ“ How It Works

  1. Reads your file (CSV, Excel, JSON(.l)) into a pandas.DataFrame.

  2. Computes column-wise metrics:

    • Missing values
    • Unique ratio (and optional sample values for low-cardinality columns)
    • Descriptive stats & outlier count for numerics
    • Date ranges for datetimes
    • Top frequencies for text/categorical
  3. Renders an interactive, colorized report with:

    • Panels per column
    • Sparklines for quick distribution glance
    • Warnings for mixed-type columns & duplicates
  4. Zero external calls โ€” all local, so safe on private data.


๐Ÿค Contributing

  1. Fork the repo
  2. Create a feature branch: git checkout -b feat/my-awesome-feature
  3. Commit your changes: git commit -m "Add feature X"
  4. Push to your branch: git push origin feat/my-awesome-feature
  5. Open a Pull Request

Please follow the existing code style and add tests for new functionality.


๐Ÿ“œ License

MIT License ยฉ Your Name See LICENSE for details.


Built with ๐Ÿ’™ and inspired by natureโ€™s graceโ€”light as air, sharp as a grebeโ€™s dive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grebes-0.1.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grebes-0.1.0-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file grebes-0.1.0.tar.gz.

File metadata

  • Download URL: grebes-0.1.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for grebes-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e99a34e209b49c1291299a76c109c4b175698c0c71b07f473c339f446b49cfbc
MD5 d182d476a2b2ec79c0a331715fdd9ecb
BLAKE2b-256 3bc727219ee80bfc360eb316eec9c1396b7cf12cef62886172c391a3a93f95ba

See more details on using hashes here.

File details

Details for the file grebes-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: grebes-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for grebes-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ff50496231a3791d5137bee59c0f418a1e26d7ab40509a5183b40f1cf4673c6
MD5 095139d93120784f0453519383eb2922
BLAKE2b-256 80c9c556b16627b6a563373de5ef507d38f18c5e365fbeda7dd79686f459b401

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page