A simple library for preprocessing and EDA on CSV files
Project description
CSVInspector 🕵️♂️
CSVInspector is a powerful Python library for automating preprocessing and exploratory data analysis (EDA) on CSV datasets. It’s built to help data scientists and analysts quickly understand the structure and quality of their data — all in one go.
✨ Features
- 📌 Basic info summary (rows, columns, data types)
- 🧬 Automatic feature type detection
- 🚫 Missing data detection and heatmaps
- 📉 Correlation matrix + heatmap
- 📊 Distribution plots (before/after outlier removal)
- 🧪 Outlier detection summary
- 🔄 Skewness detection with transformation suggestions
- 🪄 Normalization summaries (MinMaxScaler, StandardScaler)
- 📈 Quantile summaries for quick statistics
- 📋 Optional quality score
- 📄 Comprehensive Markdown report generation
📦 Installation
pip install csvinspector
🚀 Usage
Here’s a minimal example using CSVInspector:
from csvinspector import CSVInspector
inspector = CSVInspector("your_dataset.csv")
summary = inspector.run_analysis()
This will generate:
- A detailed Markdown report in
inspection_output/report.md - Plots and visualizations in the same folder
- A dictionary object
summarywith all analysis results
🖼️ Sample Output (Markdown)
# 📊 CSV Data Profiling Report
**File Analyzed**: `your_dataset.csv`
**Generated On**: 2025-05-05 15:32:21
## 📌 Basic Info
- Rows: 1000
- Columns: 12
## 🧬 Feature Types
age: numerical
gender: categorical
income: numerical
## 📈 Quantile Summary (first 5 columns)
| | count | mean | std | min | 25% |
|-------|-------|-------|-------|------|-------|
| age | 1000 | 35.4 | 9.2 | 18 | 29 |
| income| 1000 | 55000 | 15000 | 2000 | 45000 |
...
## 🔗 Correlation Matrix (first 5 rows)
| | age | income | score | ... |
|-------|-------|--------|-------|-----|
| age | 1.00 | 0.43 | 0.21 | |
| income| 0.43 | 1.00 | 0.50 | |
...
## 🕳️ Missing Data Heatmap

🛠 Development
git clone https://github.com/abhii14758/csvinspector
cd csvinspector
pip install -e .[dev]
To run analysis:
python -m csvinspector path/to/your.csv
📄 License
This project is licensed under the MIT License. See LICENSE for details.
👤 Author
Abhi
GitHub Profile
🙏 Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csvinspector-0.1.1.tar.gz.
File metadata
- Download URL: csvinspector-0.1.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
179bbc60d29d3c49cd4adf1fb03bb0709dc1b6f8b0596c4537563cfdefc563c4
|
|
| MD5 |
7e925cf1b3c27cf572f8202380784d26
|
|
| BLAKE2b-256 |
5e17db4aed858ede9723d5d1c73dfebd5cbe556142ca58571358580b0473f4bb
|
File details
Details for the file csvinspector-0.1.1-py3-none-any.whl.
File metadata
- Download URL: csvinspector-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dca2069c9363f4608b26973231399cb747efeff0ba25577af7b0071d2029d975
|
|
| MD5 |
5b6e12586cc044026900009f5ed41d6b
|
|
| BLAKE2b-256 |
975b6dccf970bb06816f490b9f3e0c51a06855e19a534986b7f48c950cc92b05
|