Skip to main content

Universal dataset profiling and intelligence tool

Project description

Aniwa

Aniwa Logo

See your data clearly.

Aniwa is an open-source universal dataset profiling and intelligence tool designed for developers, analysts, data engineers, researchers, and modern data teams.

Aniwa helps you instantly understand datasets through:

  • schema profiling
  • data quality analysis
  • statistical summaries
  • intelligent insights
  • rich terminal reports
  • shareable HTML reports

Whether you're working with CSV files, Excel spreadsheets, JSON datasets, or Parquet files, Aniwa gives you a fast and elegant way to inspect and understand data.


Why Aniwa?

Data professionals constantly work with unknown datasets.

Before trusting a dataset, people need to know:

  • What columns exist?
  • What data types are present?
  • Are there missing values?
  • Are there duplicates?
  • Are there suspicious patterns?
  • Which columns might contain IDs or PII?
  • Is the dataset healthy?

Aniwa makes answering those questions simple.


Features

Universal Dataset Support

Aniwa supports multiple modern dataset formats:

  • CSV
  • Excel
  • JSON
  • Parquet

Future releases will include:

  • PostgreSQL
  • MySQL
  • DuckDB
  • BigQuery
  • Snowflake

Core Profiling

Aniwa provides:

Dataset Summary

  • row counts
  • column counts
  • dataset size analysis

Schema Profiling

  • type inference
  • mixed type detection
  • schema overview

Data Quality Analysis

  • null analysis
  • duplicate detection
  • uniqueness analysis
  • sparse column detection

Statistical Profiling

  • minimum values
  • maximum values
  • mean
  • median
  • standard deviation

Intelligent Insights

  • possible ID detection
  • high-cardinality warnings
  • sparse column warnings
  • suspicious quality patterns

Reporting

Rich Terminal Reports

Aniwa uses Rich-powered terminal interfaces for beautiful developer-friendly output.

JSON Export

Machine-readable profiling results.

HTML Reports

Generate shareable profiling reports for teams, audits, and debugging workflows.


Installation

Clone the Repository

git clone https://github.com/ReginaldErzoah/Aniwa.git
cd Aniwa

Create a Virtual Environment

python -m venv .venv

Activate the environment:

Windows

source .venv/Scripts/activate

macOS/Linux

source .venv/bin/activate

Install Dependencies

pip install -r requirements.txt

Install Aniwa locally:

pip install -e .

Usage

Basic Profiling

aniwa examples/customers.csv

Generate JSON Report

aniwa examples/customers.csv --report json --output profile.json

Generate HTML Report

aniwa examples/customers.csv --report html --output profile.html

Fast Profiling Mode

aniwa examples/customers.csv --mode fast

Deep Profiling Mode

aniwa examples/customers.csv --mode deep

Example Console Output

┌──────────────────────────────┐
│      Aniwa Dataset Profile   │
├──────────────────────────────┤
│ Rows: 5                      │
│ Columns: 5                   │
│ Duplicate Rows: 1            │
└──────────────────────────────┘

Project Structure

Aniwa/
│
├── aniwa/
│   ├── cli.py
│   ├── core/
│   ├── io/
│   ├── models/
│   ├── reports/
│   └── utils/
│
├── tests/
├── examples/
├── README.md
├── CONTRIBUTING.md
├── requirements.txt
└── pyproject.toml

Roadmap

v0.1.0 - MVP Foundation

Core Features

  • CSV support
  • Excel support
  • JSON support
  • Parquet support
  • schema profiling
  • null analysis
  • duplicate detection
  • statistical profiling
  • console reports
  • JSON export
  • HTML reports

Developer Experience

  • Rich terminal UI
  • fast and deep modes
  • profiling insights

v0.2.0 - Intelligence Release

  • correlation analysis
  • outlier detection
  • semantic detection
  • improved insights
  • Markdown reports

v0.3.0 - Universal Connectivity

  • PostgreSQL support
  • MySQL support
  • DuckDB support
  • BigQuery support
  • profiling history
  • snapshot management

v0.4.0 - Extensibility

  • plugin system
  • custom profiling modules
  • community extensions

v0.5.0 - AI Intelligence

  • dataset summarization
  • semantic understanding
  • AI-powered recommendations
  • anomaly explanations

Philosophy

Aniwa is built around a few core principles:

  • universal
  • developer-first
  • fast
  • modular
  • intelligent
  • beautiful
  • automation-friendly

Contributing

Contributions are welcome.

See CONTRIBUTING.md for:

  • development setup
  • contribution guidelines
  • pull request workflow
  • testing instructions

License

Aniwa is released under the MIT License.

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aniwa-0.1.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aniwa-0.1.0-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file aniwa-0.1.0.tar.gz.

File metadata

  • Download URL: aniwa-0.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for aniwa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e46a2b10817ddb32d1e0ad5dabf8f58c1facb964b181d2ae1217dbe40d338855
MD5 61e8e0848242e86c3a4910275a97fa51
BLAKE2b-256 e553ca566a3dac1816ba48ee17d8bf4c4ed7dc553d4df266cfd1c65caefb56fe

See more details on using hashes here.

File details

Details for the file aniwa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aniwa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for aniwa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33723b5dd2f9951571f50f4eb1bf98ad92f18c338a923742923a51e1fe1b1d42
MD5 e4944c656c96f2666527b60e5cce5b16
BLAKE2b-256 f3844e46b48ce3e59a2d8bb1916d296ed2313c40fe493d4b1eb3e772f0cd0acb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page