Skip to main content

A simple library for preprocessing and EDA on CSV files

Project description

CSVInspector 🕵️‍♂️

CSVInspector is a powerful Python library for automating preprocessing and exploratory data analysis (EDA) on CSV datasets. It’s built to help data scientists and analysts quickly understand the structure and quality of their data — all in one go.


✨ Features

  • 📌 Basic info summary (rows, columns, data types)
  • 🧬 Automatic feature type detection
  • 🚫 Missing data detection and heatmaps
  • 📉 Correlation matrix + heatmap
  • 📊 Distribution plots (before/after outlier removal)
  • 🧪 Outlier detection summary
  • 🔄 Skewness detection with transformation suggestions
  • 🪄 Normalization summaries (MinMaxScaler, StandardScaler)
  • 📈 Quantile summaries for quick statistics
  • 📋 Optional quality score
  • 📄 Comprehensive Markdown report generation

📦 Installation

pip install csvinspector

🚀 Usage

Here’s a minimal example using CSVInspector:

from csvinspector import CSVInspector

inspector = CSVInspector("your_dataset.csv")
summary = inspector.run_analysis()

This will generate:

  • A detailed Markdown report in inspection_output/report.md
  • Plots and visualizations in the same folder
  • A dictionary object summary with all analysis results

🖼️ Sample Output (Markdown)

# 📊 CSV Data Profiling Report

**File Analyzed**: `your_dataset.csv`  
**Generated On**: 2025-05-05 15:32:21

## 📌 Basic Info
- Rows: 1000  
- Columns: 12  

## 🧬 Feature Types

age: numerical
gender: categorical
income: numerical


## 📈 Quantile Summary (first 5 columns)
|       | count | mean  | std   | min  | 25%   |
|-------|-------|-------|-------|------|-------|
| age   | 1000  | 35.4  | 9.2   | 18   | 29    |
| income| 1000  | 55000 | 15000 | 2000 | 45000 |

...

## 🔗 Correlation Matrix (first 5 rows)
|       | age   | income | score | ... |
|-------|-------|--------|-------|-----|
| age   | 1.00  | 0.43   | 0.21  |     |
| income| 0.43  | 1.00   | 0.50  |     |

...

## 🕳️ Missing Data Heatmap
![Missing Data Heatmap](inspection_output/missing_data_heatmap.png)

🛠 Development

git clone https://github.com/abhii14758/csvinspector
cd csvinspector
pip install -e .[dev]

To run analysis:

python -m csvinspector path/to/your.csv

📄 License

This project is licensed under the MIT License. See LICENSE for details.


👤 Author

Abhi
GitHub Profile


🙏 Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvinspector-0.1.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csvinspector-0.1.1-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file csvinspector-0.1.1.tar.gz.

File metadata

  • Download URL: csvinspector-0.1.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for csvinspector-0.1.1.tar.gz
Algorithm Hash digest
SHA256 179bbc60d29d3c49cd4adf1fb03bb0709dc1b6f8b0596c4537563cfdefc563c4
MD5 7e925cf1b3c27cf572f8202380784d26
BLAKE2b-256 5e17db4aed858ede9723d5d1c73dfebd5cbe556142ca58571358580b0473f4bb

See more details on using hashes here.

File details

Details for the file csvinspector-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: csvinspector-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for csvinspector-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dca2069c9363f4608b26973231399cb747efeff0ba25577af7b0071d2029d975
MD5 5b6e12586cc044026900009f5ed41d6b
BLAKE2b-256 975b6dccf970bb06816f490b9f3e0c51a06855e19a534986b7f48c950cc92b05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page