Automated CSV data analysis with statistical profiling and visualization
Project description
AutoCSV Profiler
A Python toolkit for automated CSV data analysis with statistical profiling and visualization.
Overview
AutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.
Key Features:
- Interactive analysis mode with step-by-step guidance
- Automatic delimiter detection and encoding validation
- Memory-efficient chunked processing for large files
- Statistical analysis with descriptive statistics and data quality metrics
- Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)
- Rich console interface with progress tracking
- Configurable via CLI flags or environment variables
Installation
Requirements: Python 3.8 - 3.13
pip install autocsv-profiler
Quick Start
Interactive Mode:
autocsv-profiler
Step-by-step guidance for first-time users.
Direct Analysis:
autocsv-profiler data.csv
Quick analysis with sensible defaults.
Usage
# Show help
autocsv-profiler --help
Command Line Interface
# Show help
autocsv-profiler --help
# Basic analysis
autocsv-profiler data.csv
# Custom output directory
autocsv-profiler data.csv --output results/
# Custom delimiter
autocsv-profiler data.csv --delimiter ";"
# Large file processing
autocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000
# Non-interactive mode
autocsv-profiler data.csv --non-interactive
# Debug mode
autocsv-profiler data.csv --debug
Python API
import autocsv_profiler
# Basic analysis
result_dir = autocsv_profiler.analyze('data.csv')
print(f"Analysis saved to: {result_dir}")
# Custom configuration
result_dir = autocsv_profiler.analyze(
csv_file_path='data.csv',
output_dir='results/',
delimiter=',',
chunk_size=10000,
memory_limit_gb=1
)
# Interactive mode
result_dir = autocsv_profiler.analyze(
csv_file_path='data.csv',
interactive=True
)
Configuration
Environment variables with AUTOCSV_ prefix:
# Performance settings
export AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2
export AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000
# Logging settings
export AUTOCSV_LOGGING_LEVEL=DEBUG
export AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO
Output Files
Analysis generates the following files in the output directory:
Data Summaries:
dataset_analysis.txt- Dataset overview and basic statisticsnumerical_summary.csv- Summary statistics for numeric columnscategorical_summary.csv- Summary for categorical columnsnumerical_stats.csv- Descriptive statistics using researchpycategorical_stats.csv- Categorical frequency analysisdistinct_values.txt- Unique value counts per column
Visualizations:
kde_plots/- Kernel density estimation plotsbox_plots/- Box plots for numerical variablesqq_plots/- Q-Q plots for normality testingbar_charts/- Bar charts for categorical variablespie_charts/- Categorical distribution pie charts
Process Logs:
autocsv_profiler.log- Processing log file
Documentation
User Documentation:
- User Guide - Installation, CLI usage, and examples
- Configuration - Settings and environment variables
- Troubleshooting - Problem-solving guide
Developer Documentation:
- API Reference - Python API documentation
- Developer Guide - Development workflow and architecture
- Architecture Diagrams - Visual system architecture
Complete Index:
- Documentation Index - Complete documentation overview
Contributing
Contributions are welcome! See CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
This software includes third-party components. See NOTICE for complete license information.
Links
- PyPI: https://pypi.org/project/autocsv-profiler/
- Repository: https://github.com/dhaneshbb/autocsv-profiler
- Documentation: https://github.com/dhaneshbb/autocsv-profiler/blob/master/docs/index.md
- Issues: https://github.com/dhaneshbb/autocsv-profiler/issues
- Changelog: https://github.com/dhaneshbb/autocsv-profiler/blob/master/CHANGELOG.md
Version: 2.0.0 | Status: Beta | Python: 3.8-3.13
Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file autocsv_profiler-2.0.0.tar.gz.
File metadata
- Download URL: autocsv_profiler-2.0.0.tar.gz
- Upload date:
- Size: 148.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05396fcb48448aaeaf19c575edc1e6381a97bc2b36b95e0c1b2cbc53c5ed9fd4
|
|
| MD5 |
58903947dd83204c26808e76d92915c4
|
|
| BLAKE2b-256 |
4f2c05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6
|
File details
Details for the file autocsv_profiler-2.0.0-py3-none-any.whl.
File metadata
- Download URL: autocsv_profiler-2.0.0-py3-none-any.whl
- Upload date:
- Size: 154.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd271f2fb909c0dbe5a060beffd8b33f6887f1837acf9eb9763ec15c4add2541
|
|
| MD5 |
3546392d2254ac755188596569618b4d
|
|
| BLAKE2b-256 |
13f96bf3243ba11486352d31036f27572d9dd896fa08e6900e843ea4c5758495
|