Skip to main content

A Python library for Exploratory Data Analysis and Profiling.

Project description

Pydata-visualizer

PyPI version Python versions License: MIT

A powerful and intuitive Python library for exploratory data analysis and data profiling. Pydata-visualizer automatically analyzes your dataset, generates interactive visualizations, and provides detailed statistical insights with minimal code.

Features

  • 📊 Comprehensive Data Profiling: Analyze numerical, categorical, boolean, and string data types
  • 🔍 Automated Data Quality Checks: Detect missing values, outliers, skewed distributions, and more
  • 📈 Interactive Visualizations: Generate distribution plots, correlations heatmaps, and statistical charts
  • 📝 Rich HTML Reports: Export analysis to visually appealing and shareable HTML reports
  • Performance Optimized: Fast analysis even on large datasets
  • 🔄 Correlation Analysis: Calculate Pearson, Spearman, and Cramér's V correlations between variables

Installation

pip install pydata-visualizer

Quick Start

import pandas as pd
from data_visualizer.profiler import AnalysisReport, Settings

# Load your dataset
df = pd.read_csv("your_dataset.csv")

# Create a report with default settings
report = AnalysisReport(df)
report.to_html("report.html")

Advanced Usage

Customizing Analysis Settings

from data_visualizer.profiler import AnalysisReport, Settings

# Configure analysis settings
report_settings = Settings(
    minimal=False,          # Set to True for faster, minimal analysis
    top_n_values=5,         # Show top 5 values in categorical columns
    skewness_threshold=2.0  # Tolerance for skewness alerts
)

# Create report with custom settings
report = AnalysisReport(df, settings=report_settings)

# Perform analysis and get results dictionary
results = report.analyse()

# Generate HTML report
report.to_html("custom_report.html")

Report Structure

The generated report includes:

  • Overview: Dataset dimensions, missing values, duplicate rows
  • Variable Analysis: Detailed per-column statistics and visualizations
  • Sample Data: Head and tail samples of the dataset
  • Correlations: Correlation matrices and heatmaps

API Reference

AnalysisReport Class

class AnalysisReport:
    def __init__(self, data, settings=None):
        """
        Initialize the analysis report object.
        
        Parameters:
        -----------
        data : pandas.DataFrame
            The dataset to analyze
        settings : Settings, optional
            Configuration settings for the analysis
        """
        
    def analyse(self):
        """
        Perform the data analysis.
        
        Returns:
        --------
        dict
            A dictionary containing all analysis results
        """
        
    def to_html(self, filename="report.html"):
        """
        Generate an HTML report from the analysis.
        
        Parameters:
        -----------
        filename : str, optional
            Path to save the HTML report (default: "report.html")
        """

Settings Class

class Settings(pydantic.BaseModel):
    """
    Settings for the analysis report.
    
    Attributes:
    -----------
    minimal : bool, default=False
        Whether to perform minimal analysis
    top_n_values : int, default=10
        Number of top values to show for categorical columns
    skewness_threshold : float, default=1.0
        Threshold for skewness alerts
    """

Type Analyzers

The library automatically detects and applies the appropriate analysis for different data types:

  • Numeric: Statistical measures, distribution plots, skewness, kurtosis
  • Categorical/String: Value counts, cardinality, frequency distributions
  • Boolean: Value counts and proportions
  • Generic: Basic analysis for unrecognized types

Correlation Analysis

Three correlation methods are calculated when possible:

  • Pearson: Linear correlation between numerical variables
  • Spearman: Rank correlation capturing monotonic relationships
  • Cramér's V: Measure of association between categorical variables

Data Quality Alerts

The library automatically detects potential issues in your data:

  • High Missing Values: Columns with significant missing data
  • Skewness: Highly skewed distributions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Credits

Created by Aditya Deshmukh (adideshmukh2005@gmail.com)

GitHub: https://github.com/Adi-Deshmukh/Pydata-visualizer

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydata_visualizer-0.2.3.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydata_visualizer-0.2.3-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file pydata_visualizer-0.2.3.tar.gz.

File metadata

  • Download URL: pydata_visualizer-0.2.3.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for pydata_visualizer-0.2.3.tar.gz
Algorithm Hash digest
SHA256 7c1c67d5c8fcb34ee3b2de1eb88bfc51d409cae29b9416df6ed9acc65c53f9a3
MD5 853b6e97052b00e12a80676a36236f2d
BLAKE2b-256 88d7657331aae068b634eca9a8c99cf52b435246882c340eefe4a1e196ab81ce

See more details on using hashes here.

File details

Details for the file pydata_visualizer-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pydata_visualizer-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1ca86819f3a9373a8cf5fbc477c29b440a5a754b6e8a46202d71f1979c8a2f50
MD5 02cf563a02794957f1640103137e0e69
BLAKE2b-256 b9d0a3d4eacc0e5a5f42d7839322d87e76b39887287664e0a0625925281f6bb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page