A Python library for Exploratory Data Analysis and Profiling.

These details have not been verified by PyPI

Project links

Project description

Pydata-visualizer

A powerful and intuitive Python library for exploratory data analysis and data profiling. Pydata-visualizer automatically analyzes your dataset, generates interactive visualizations, and provides detailed statistical insights with minimal code.

Features

Comprehensive Data Profiling: Analyze numerical, categorical, boolean, and string data types
Automated Data Quality Checks: Detect missing values, outliers, skewed distributions, duplicate rows, and more
Interactive Visualizations: Generate distribution plots, correlation heatmaps, word clouds, and statistical charts
Text Analysis: Automatic word frequency analysis and word cloud generation for text columns
Rich HTML Reports: Export analysis to visually appealing and shareable HTML reports
Performance Optimized: Fast analysis even on large datasets
Correlation Analysis: Calculate Pearson, Spearman, and Cramér's V correlations between variables
Flexible Configuration: Customize analysis thresholds and options via the Settings class

Installation

pip install pydata-visualizer

Quick Start

import pandas as pd
from data_visualizer.profiler import AnalysisReport, Settings

# Load your dataset
df = pd.read_csv("your_dataset.csv")

# Create a report with default settings
report = AnalysisReport(df)
report.to_html("report.html")

Advanced Usage

Customizing Analysis Settings

from data_visualizer.profiler import AnalysisReport, Settings

# Configure analysis settings
report_settings = Settings(
    minimal=False,              # Set to True for faster, minimal analysis
    top_n_values=5,             # Show top 5 values in categorical columns
    skewness_threshold=2.0,     # Tolerance for skewness alerts
    outlier_method='iqr',       # Outlier detection method: 'iqr' or 'zscore'
    outlier_threshold=1.5,      # IQR multiplier for outlier detection
    duplicate_threshold=5.0,    # Percentage threshold for duplicate alerts
    text_analysis=True          # Enable word frequency analysis for text columns
)

# Create report with custom settings
report = AnalysisReport(df, settings=report_settings)

# Perform analysis and get results dictionary
results = report.analyse()

# Generate HTML report
report.to_html("custom_report.html")

Report Structure

The generated report includes:

Overview: Dataset dimensions, missing values, duplicate rows, and duplicate percentage
Variable Analysis: Detailed per-column statistics and visualizations including:
- Distribution plots for numeric data
- Bar charts for categorical data
- Word clouds and frequency analysis for text data
- Outlier detection and highlighting
Sample Data: Head and tail samples of the dataset
Correlations: Correlation matrices and heatmaps (Pearson, Spearman, Cramér's V)
Data Quality Alerts: Automated detection of data quality issues

API Reference

`AnalysisReport` Class

class AnalysisReport:
    def __init__(self, data, settings=None):
        """
        Initialize the analysis report object.
        
        Parameters:
        -----------
        data : pandas.DataFrame
            The dataset to analyze
        settings : Settings, optional
            Configuration settings for the analysis
        """
        
    def analyse(self):
        """
        Perform the data analysis.
        
        Returns:
        --------
        dict
            A dictionary containing all analysis results
        """
        
    def to_html(self, filename="report.html"):
        """
        Generate an HTML report from the analysis.
        
        Parameters:
        -----------
        filename : str, optional
            Path to save the HTML report (default: "report.html")
        """

`Settings` Class

class Settings(pydantic.BaseModel):
    """
    Settings for the analysis report.
    
    Attributes:
    -----------
    minimal : bool, default=False
        Whether to perform minimal analysis (skips type-specific analysis and visualizations)
    
    top_n_values : int, default=10
        Number of top values to show for categorical columns (must be >= 1)
    
    skewness_threshold : float, default=1.0
        Threshold for skewness alerts (must be >= 0.0)
    
    outlier_method : str, default='iqr'
        Outlier detection method: 'iqr' (Interquartile Range) or 'zscore'
    
    outlier_threshold : float, default=1.5
        IQR multiplier for outlier detection (must be >= 0.0)
        Standard: 1.5 for moderate outliers, 3.0 for extreme outliers
    
    duplicate_threshold : float, default=5.0
        Percentage of duplicate rows to trigger an alert (must be >= 0.0)
    
    text_analysis : bool, default=True
        Enable word frequency analysis and word cloud generation for text columns
    """

Type Analyzers

The library automatically detects and applies the appropriate analysis for different data types:

Numeric (Integer/Float): Statistical measures (mean, std, quartiles), distribution plots, skewness, kurtosis, outlier detection
Categorical/Object: Value counts, cardinality analysis, frequency distributions, top N values
String: Unique value counts, cardinality, top N values, word frequency analysis, word cloud generation
Boolean: Value counts and proportions
Generic: Basic analysis for unrecognized types

Correlation Analysis

Three correlation methods are calculated when applicable:

Pearson: Linear correlation between numerical variables (range: -1 to 1)
Spearman: Rank correlation capturing monotonic relationships (range: -1 to 1)
Cramér's V: Measure of association between categorical variables (range: 0 to 1)

Data Quality Alerts

The library automatically detects potential issues in your data:

High Missing Values: Columns with more than 20% missing data
Skewness: Distributions exceeding the configured skewness threshold
Outliers: Data points detected using IQR or Z-score methods
High Duplicates: Duplicate rows exceeding the configured threshold percentage

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Created by Aditya Deshmukh (adideshmukh2005@gmail.com)

GitHub: https://github.com/Adi-Deshmukh/Pydata-visualizer

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.0

Feb 24, 2026

1.0.2

Oct 21, 2025

1.0.1

Oct 21, 2025

This version

1.0.0

Oct 11, 2025

0.2.3

Aug 21, 2025

0.2.2

Aug 19, 2025

0.2.1

Aug 19, 2025

0.2.0

Aug 19, 2025

0.1.1

Aug 18, 2025

0.1.0

Aug 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydata_visualizer-1.0.0.tar.gz (19.8 kB view details)

Uploaded Oct 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydata_visualizer-1.0.0-py3-none-any.whl (21.8 kB view details)

Uploaded Oct 11, 2025 Python 3

File details

Details for the file pydata_visualizer-1.0.0.tar.gz.

File metadata

Download URL: pydata_visualizer-1.0.0.tar.gz
Upload date: Oct 11, 2025
Size: 19.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pydata_visualizer-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`458138421dba3ffa592b1deae2b590e9418a07995e8ce1f24775e043fe83a518`
MD5	`4f7bd082e2fe82405098719a50acdfd6`
BLAKE2b-256	`5671e1336165a8364bb5c1fa94597a37670e74246cd890dc3a79086889aa5335`

See more details on using hashes here.

File details

Details for the file pydata_visualizer-1.0.0-py3-none-any.whl.

File metadata

Download URL: pydata_visualizer-1.0.0-py3-none-any.whl
Upload date: Oct 11, 2025
Size: 21.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pydata_visualizer-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`19d07c66ff525e48c7229cb9095e5308fe6345856fc0db05d531d4b0c464a6d8`
MD5	`bd9973c6aaa3a1148123a8cf0c1db1de`
BLAKE2b-256	`37f29608bd740a4251c316bdd6cdd3666ec10f7c53f74534c30c63ae570402b6`

See more details on using hashes here.

Pydata-visualizer 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pydata-visualizer

Features

Installation

Quick Start

Advanced Usage

Customizing Analysis Settings

Report Structure

API Reference

AnalysisReport Class

Settings Class

Type Analyzers

Correlation Analysis

Data Quality Alerts

Contributing

License

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`AnalysisReport` Class

`Settings` Class