A Python library for Exploratory Data Analysis and Profiling.
Project description
Pydata-visualizer
A powerful and intuitive Python library for exploratory data analysis and data profiling. Pydata-visualizer automatically analyzes your dataset, generates interactive visualizations, and provides detailed statistical insights with minimal code.
Features
- 📊 Comprehensive Data Profiling: Analyze numerical, categorical, boolean, and string data types
- 🔍 Automated Data Quality Checks: Detect missing values, outliers, skewed distributions, and more
- 📈 Interactive Visualizations: Generate distribution plots, correlations heatmaps, and statistical charts
- 📝 Rich HTML Reports: Export analysis to visually appealing and shareable HTML reports
- ⚡ Performance Optimized: Fast analysis even on large datasets
- 🔄 Correlation Analysis: Calculate Pearson, Spearman, and Cramér's V correlations between variables
Installation
pip install pydata-visualizer
Quick Start
import pandas as pd
from data_visualizer.profiler import AnalysisReport, Settings
# Load your dataset
df = pd.read_csv("your_dataset.csv")
# Create a report with default settings
report = AnalysisReport(df)
report.to_html("report.html")
Advanced Usage
Customizing Analysis Settings
from data_visualizer.profiler import AnalysisReport, Settings
# Configure analysis settings
report_settings = Settings(
minimal=False, # Set to True for faster, minimal analysis
top_n_values=5, # Show top 5 values in categorical columns
skewness_threshold=2.0 # Tolerance for skewness alerts
)
# Create report with custom settings
report = AnalysisReport(df, settings=report_settings)
# Perform analysis and get results dictionary
results = report.analyse()
# Generate HTML report
report.to_html("custom_report.html")
Report Structure
The generated report includes:
- Overview: Dataset dimensions, missing values, duplicate rows
- Variable Analysis: Detailed per-column statistics and visualizations
- Sample Data: Head and tail samples of the dataset
- Correlations: Correlation matrices and heatmaps
API Reference
AnalysisReport Class
class AnalysisReport:
def __init__(self, data, settings=None):
"""
Initialize the analysis report object.
Parameters:
-----------
data : pandas.DataFrame
The dataset to analyze
settings : Settings, optional
Configuration settings for the analysis
"""
def analyse(self):
"""
Perform the data analysis.
Returns:
--------
dict
A dictionary containing all analysis results
"""
def to_html(self, filename="report.html"):
"""
Generate an HTML report from the analysis.
Parameters:
-----------
filename : str, optional
Path to save the HTML report (default: "report.html")
"""
Settings Class
class Settings(pydantic.BaseModel):
"""
Settings for the analysis report.
Attributes:
-----------
minimal : bool, default=False
Whether to perform minimal analysis
top_n_values : int, default=10
Number of top values to show for categorical columns
skewness_threshold : float, default=1.0
Threshold for skewness alerts
"""
Type Analyzers
The library automatically detects and applies the appropriate analysis for different data types:
- Numeric: Statistical measures, distribution plots, skewness, kurtosis
- Categorical/String: Value counts, cardinality, frequency distributions
- Boolean: Value counts and proportions
- Generic: Basic analysis for unrecognized types
Correlation Analysis
Three correlation methods are calculated when possible:
- Pearson: Linear correlation between numerical variables
- Spearman: Rank correlation capturing monotonic relationships
- Cramér's V: Measure of association between categorical variables
Data Quality Alerts
The library automatically detects potential issues in your data:
- High Missing Values: Columns with significant missing data
- Skewness: Highly skewed distributions
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Credits
Created by Aditya Deshmukh (adideshmukh2005@gmail.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydata_visualizer-0.2.3.tar.gz.
File metadata
- Download URL: pydata_visualizer-0.2.3.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c1c67d5c8fcb34ee3b2de1eb88bfc51d409cae29b9416df6ed9acc65c53f9a3
|
|
| MD5 |
853b6e97052b00e12a80676a36236f2d
|
|
| BLAKE2b-256 |
88d7657331aae068b634eca9a8c99cf52b435246882c340eefe4a1e196ab81ce
|
File details
Details for the file pydata_visualizer-0.2.3-py3-none-any.whl.
File metadata
- Download URL: pydata_visualizer-0.2.3-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ca86819f3a9373a8cf5fbc477c29b440a5a754b6e8a46202d71f1979c8a2f50
|
|
| MD5 |
02cf563a02794957f1640103137e0e69
|
|
| BLAKE2b-256 |
b9d0a3d4eacc0e5a5f42d7839322d87e76b39887287664e0a0625925281f6bb3
|