Comprehensive automated CSV data analysis with statistical insights and visualizations

These details have not been verified by PyPI

Project links

Project description

AutoCSV Profiler

A comprehensive toolkit for automated CSV data analysis providing statistical insights, data quality assessment, and interactive visualizations.

Features

Comprehensive Statistical Analysis: Descriptive statistics, distributions, and data summaries
Data Quality Assessment: Missing value analysis, outlier detection, and duplicate identification
Advanced Visualizations: Box plots, histograms, correlation matrices, and KDE plots
Interactive Reports: HTML reports with detailed insights and recommendations
Command-Line Interface: Easy-to-use CLI for immediate analysis
Python API: Programmatic access for integration into data pipelines

Installation

pip install autocsv-profiler

Quick Start

Command Line Usage

# Basic analysis
autocsv-profiler data.csv

# Specify output directory
autocsv-profiler data.csv --output ./my_analysis

# Custom delimiter
autocsv-profiler data.csv --delimiter ";"

Python API Usage

from autocsv_profiler import auto_csv_profiler

# Run comprehensive analysis
auto_csv_profiler.main("data.csv", "output_directory")

# Or import specific functions
from autocsv_profiler.recognize_delimiter import detect_delimiter

delimiter = detect_delimiter("data.csv")
print(f"Detected delimiter: {delimiter}")

Analysis Workflow

Generated Outputs

Statistical Reports

Dataset Overview: Shape, data types, memory usage
Descriptive Statistics: Mean, median, mode, standard deviation
Distribution Analysis: Skewness, kurtosis, normality tests
Categorical Analysis: Frequency tables and unique value counts

Data Quality Assessment

Missing Values: Patterns, counts, and visualizations
Outliers: IQR-based detection with statistical summaries
Duplicates: Identification and detailed reporting
Data Consistency: Type validation and integrity checks

Visualizations

Distribution Plots: Histograms with KDE overlays
Box Plots: Outlier visualization and quartile analysis
Correlation Analysis: Heatmaps and relationship matrices
Missing Data Patterns: Matrix plots and summary charts

Interactive Reports

HTML Dashboard: Comprehensive overview with navigation
Data Dictionary: Detailed variable descriptions
Quality Summary: Actionable insights and recommendations

Output Structure

your_file_analysis/
├── your_file.csv                     # Copy of original data
├── dataset_info.txt                  # Basic dataset information
├── summary_statistics_all.txt        # Comprehensive statistics
├── categorical_summary.txt           # Categorical variable analysis
├── missing_values_report.txt         # Missing data analysis
├── outliers_summary.txt              # Outlier detection results
├── distinct_values_count_by_dtype.html # Interactive value explorer
└── visualization/                    # Generated plots and charts
    ├── box_plots/
    ├── histograms/
    └── correlation_matrices/

Advanced Features

Missing Value Analysis

Automatic detection of missing value patterns
Visualization of missing data distribution
Imputation suggestions and options
Missing value correlation analysis

Outlier Detection

IQR-based outlier identification
Statistical summaries for outliers
Visual outlier highlighting in plots
Outlier impact assessment

Statistical Testing

Normality tests (Shapiro-Wilk)
Correlation analysis (Pearson, Spearman)
Chi-square tests for categorical variables
Variance inflation factor (VIF) analysis

Relationship Analysis

Variable correlation matrices
Target variable analysis (if specified)
Feature importance insights
Interaction effect detection

Examples

Basic CSV Analysis

import autocsv_profiler

# Analyze sales data
autocsv_profiler.main("sales_data.csv", "sales_analysis")

Custom Analysis Pipeline

from autocsv_profiler import auto_csv_profiler
from autocsv_profiler.recognize_delimiter import detect_delimiter
import pandas as pd

# Load and analyze data
delimiter = detect_delimiter("customer_data.csv")
df = pd.read_csv("customer_data.csv", delimiter=delimiter)

# Run comprehensive analysis
auto_csv_profiler.main("customer_data.csv", "customer_analysis")

Batch Processing

import os
from autocsv_profiler import auto_csv_profiler

# Analyze all CSV files in a directory
for filename in os.listdir("data/"):
    if filename.endswith(".csv"):
        input_file = f"data/{filename}"
        output_dir = f"analysis/{filename[:-4]}_results"
        auto_csv_profiler.main(input_file, output_dir)

Requirements

Python 3.9 or higher
pandas >= 1.5.0
numpy >= 1.24.0
matplotlib >= 3.6.0
seaborn >= 0.12.0
scipy >= 1.10.0
scikit-learn >= 1.2.0
statsmodels >= 0.13.0

All dependencies are automatically installed with pip.

Performance Tips

Large Files: For files > 100MB, consider sampling first
Memory Usage: Monitor memory for datasets with many categorical variables
Output Management: Clean old analysis directories to save disk space
Parallel Processing: Use batch scripts for multiple files

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Documentation: GitHub Docs
Changelog: CHANGELOG.md

Version

Current version: 1.1.0

See CHANGELOG.md for version history and updates.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0

Oct 9, 2025

This version

1.1.0

Aug 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autocsv_profiler-1.1.0.tar.gz (60.2 kB view details)

Uploaded Aug 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autocsv_profiler-1.1.0-py3-none-any.whl (41.0 kB view details)

Uploaded Aug 4, 2025 Python 3

File details

Details for the file autocsv_profiler-1.1.0.tar.gz.

File metadata

Download URL: autocsv_profiler-1.1.0.tar.gz
Upload date: Aug 4, 2025
Size: 60.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for autocsv_profiler-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a7e0c3002c7c3e8fe58dd2425c226e5168ffbe5ce86aba6c0775bc2b6b4244c1`
MD5	`b6a66e6e60b39a3151c68023fd8c47df`
BLAKE2b-256	`537fedbc5be51567e380519ee7f1bb0308e5e4a7d0ab888d27c92969e1f0edd0`

See more details on using hashes here.

File details

Details for the file autocsv_profiler-1.1.0-py3-none-any.whl.

File metadata

Download URL: autocsv_profiler-1.1.0-py3-none-any.whl
Upload date: Aug 4, 2025
Size: 41.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for autocsv_profiler-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`355bbcceb5edc15a9841aa6b8c427867b6981c36dec4815865a48ace26777558`
MD5	`2c29c47669673837ff0cd443addda983`
BLAKE2b-256	`3f8febe85d7261f1f8c32c141575b5a359edbcbad98cbc97e6b6c132f02c4c73`

See more details on using hashes here.

autocsv-profiler 1.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

AutoCSV Profiler

Features

Installation

Quick Start

Command Line Usage

Python API Usage

Analysis Workflow

Generated Outputs

Statistical Reports

Data Quality Assessment

Visualizations

Interactive Reports

Output Structure

Advanced Features

Missing Value Analysis

Outlier Detection

Statistical Testing

Relationship Analysis

Examples

Basic CSV Analysis

Custom Analysis Pipeline

Batch Processing

Requirements

Performance Tips

Contributing

License

Support

Version

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes