Skip to main content

Automated CSV data analysis with statistical profiling and visualization

Project description

AutoCSV Profiler

A Python toolkit for automated CSV data analysis with statistical profiling and visualization.

PyPI version Python Version License Status

Overview

AutoCSV Profiler provides automated analysis of CSV files with statistical summaries, data quality assessment, and visualization generation. It features memory-efficient processing, automatic delimiter detection, and a rich console interface.

Key Features:

  • Interactive analysis mode with step-by-step guidance
  • Automatic delimiter detection and encoding validation
  • Memory-efficient chunked processing for large files
  • Statistical analysis with descriptive statistics and data quality metrics
  • Visualization generation (KDE plots, box plots, Q-Q plots, bar charts, pie charts)
  • Rich console interface with progress tracking
  • Configurable via CLI flags or environment variables

Installation

Requirements: Python 3.8 - 3.13

pip install autocsv-profiler

Quick Start

Interactive Mode:

autocsv-profiler

Step-by-step guidance for first-time users.

Direct Analysis:

autocsv-profiler data.csv

Quick analysis with sensible defaults.

Usage

# Show help
autocsv-profiler --help

Command Line Interface

# Show help
autocsv-profiler --help

# Basic analysis
autocsv-profiler data.csv

# Custom output directory
autocsv-profiler data.csv --output results/

# Custom delimiter
autocsv-profiler data.csv --delimiter ";"

# Large file processing
autocsv-profiler data.csv --memory-limit 4.0 --chunk-size 20000

# Non-interactive mode
autocsv-profiler data.csv --non-interactive

# Debug mode
autocsv-profiler data.csv --debug

Python API

import autocsv_profiler

# Basic analysis
result_dir = autocsv_profiler.analyze('data.csv')
print(f"Analysis saved to: {result_dir}")

# Custom configuration
result_dir = autocsv_profiler.analyze(
    csv_file_path='data.csv',
    output_dir='results/',
    delimiter=',',
    chunk_size=10000,
    memory_limit_gb=1
)

# Interactive mode
result_dir = autocsv_profiler.analyze(
    csv_file_path='data.csv',
    interactive=True
)

Configuration

Environment variables with AUTOCSV_ prefix:

# Performance settings
export AUTOCSV_PERFORMANCE_MEMORY_LIMIT_GB=2
export AUTOCSV_PERFORMANCE_CHUNK_SIZE=20000

# Logging settings
export AUTOCSV_LOGGING_LEVEL=DEBUG
export AUTOCSV_LOGGING_CONSOLE_LEVEL=INFO

Output Files

Analysis generates the following files in the output directory:

Data Summaries:

  • dataset_analysis.txt - Dataset overview and basic statistics
  • numerical_summary.csv - Summary statistics for numeric columns
  • categorical_summary.csv - Summary for categorical columns
  • numerical_stats.csv - Descriptive statistics using researchpy
  • categorical_stats.csv - Categorical frequency analysis
  • distinct_values.txt - Unique value counts per column

Visualizations:

  • kde_plots/ - Kernel density estimation plots
  • box_plots/ - Box plots for numerical variables
  • qq_plots/ - Q-Q plots for normality testing
  • bar_charts/ - Bar charts for categorical variables
  • pie_charts/ - Categorical distribution pie charts

Process Logs:

  • autocsv_profiler.log - Processing log file

Documentation

User Documentation:

Developer Documentation:

Complete Index:

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

License

MIT License - see LICENSE for details.

This software includes third-party components. See NOTICE for complete license information.

Links


Version: 2.0.0 | Status: Beta | Python: 3.8-3.13

Copyright 2025 dhaneshbb | License: MIT | Homepage: https://github.com/dhaneshbb/autocsv-profiler

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autocsv_profiler-2.0.0.tar.gz (148.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autocsv_profiler-2.0.0-py3-none-any.whl (154.1 kB view details)

Uploaded Python 3

File details

Details for the file autocsv_profiler-2.0.0.tar.gz.

File metadata

  • Download URL: autocsv_profiler-2.0.0.tar.gz
  • Upload date:
  • Size: 148.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for autocsv_profiler-2.0.0.tar.gz
Algorithm Hash digest
SHA256 05396fcb48448aaeaf19c575edc1e6381a97bc2b36b95e0c1b2cbc53c5ed9fd4
MD5 58903947dd83204c26808e76d92915c4
BLAKE2b-256 4f2c05e20e74ca2528a2076b14d43babc2f5d878c19b6f180157bd30d30d0ad6

See more details on using hashes here.

File details

Details for the file autocsv_profiler-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for autocsv_profiler-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd271f2fb909c0dbe5a060beffd8b33f6887f1837acf9eb9763ec15c4add2541
MD5 3546392d2254ac755188596569618b4d
BLAKE2b-256 13f96bf3243ba11486352d31036f27572d9dd896fa08e6900e843ea4c5758495

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page