🔬 Professional data inspection and visualization toolkit for Polars - featuring advanced X-ray analysis, comprehensive plotting, and data quality assessment.
Project description
🔬 Polarscope
Professional data inspection and visualization toolkit for Polars 🐻❄️
Polarscope is a modern, fast, and comprehensive data analysis library built exclusively for Polars. It provides advanced X-ray analysis, beautiful visualizations, and comprehensive data quality assessment - all with native Polars performance.
✨ Key Features
🔬 X-ray Analysis
xray(df)→ Comprehensive data quality assessment with beautiful Great Tables output- Advanced statistics: skewness, kurtosis, outliers, normality tests, data quality flags
- Performance metrics: execution timing and memory usage tracking
- Customizable output: minimal or expanded views with professional formatting
📊 Multi-Backend Visualization
corr_heatmap(df)→ Correlation analysis with advanced filtering and target analysisdist_plot(df)→ Distribution plotting with statistical overlaysmissingval_plot(df)→ Missing value pattern analysiscat_plot(df)→ Categorical data frequency analysiscorr_plot(df)→ Interactive correlation exploration- 3 backends supported: Plotly (default), Seaborn, Altair
🧹 Data Processing
clean_column_names(df)→ Normalize and deduplicate column namesdata_cleaning(df)→ Comprehensive automated data cleaning pipelineconvert_datatypes(df)→ Intelligent dtype optimizationdrop_missing(df)→ Advanced missing value handling
🛠️ Utilities
save_fig()→ Universal figure saving (PNG, HTML, SVG, etc.)- Native Polars performance - no Pandas dependency
- Type-safe with comprehensive docstrings
🚀 Quick Start
Installation
# Basic installation
pip install polarscope
# With all optional dependencies
pip install polarscope[all]
# Individual extras
pip install polarscope[plotly,seaborn,great_tables]
Basic Usage
import polars as pl
import polarscope as plc
# Load your data
df = pl.read_csv("data.csv")
# 🔬 Get comprehensive X-ray analysis
plc.xray(df)
# 📊 Create beautiful visualizations
plc.corr_heatmap(df, backend="plotly")
plc.dist_plot(df, column="price", backend="seaborn")
plc.missingval_plot(df, backend="altair")
# 🧹 Clean and optimize your data
df_clean = plc.data_cleaning(df)
df_optimized = plc.convert_datatypes(df_clean)
Advanced X-ray Analysis
# Expanded analysis with custom settings
plc.xray(
df,
expanded=True, # Show all statistics
corr_target="target_column", # Correlation to specific column
outlier_method="iqr", # Outlier detection method
decimals=3, # Formatting precision
compact=True # Compact number formatting
)
# Custom percentiles and quality thresholds
plc.xray(
df,
percentiles=[0.1, 0.25, 0.5, 0.75, 0.9],
missing_threshold=0.3, # Flag high missingness
outlier_threshold=0.05, # Flag outlier-heavy columns
normality_test=True, # Include normality tests
uniformity_test=True # Include uniformity tests
)
📈 Performance
Polarscope is built for speed and efficiency:
- Lightning fast: Native Polars operations throughout
- Memory efficient: Optimized for large datasets
- Scalable: Handles millions of rows with ease
- Professional: Comprehensive test suite with 100% pass rate
🎯 Why Polarscope?
Polars-Native
- Built exclusively for Polars - no Pandas dependencies
- Native performance and memory efficiency
- Type-safe operations with Polars' query engine
Professional Quality
- Production-ready with comprehensive testing
- Beautiful, customizable output with Great Tables
- Scientific precision in statistical analysis
Multi-Backend Flexibility
- Choose your preferred visualization backend
- Consistent API across Plotly, Seaborn, and Altair
- Easy switching between backends
Comprehensive Analysis
- Goes beyond basic
.describe()functionality - Advanced statistics and quality assessment
- Data quality flags and recommendations
📚 Documentation
Core Functions
| Function | Description | Key Features |
|---|---|---|
xray() |
Comprehensive data analysis | Statistics, quality flags, performance metrics |
corr_heatmap() |
Correlation visualization | Multiple backends, filtering, target analysis |
dist_plot() |
Distribution analysis | Statistical overlays, multiple backends |
missingval_plot() |
Missing value patterns | Percentage/absolute counts, pattern analysis |
data_cleaning() |
Automated cleaning | Duplicates, missing values, optimization |
Backends
- Plotly (default): Interactive, web-ready visualizations
- Seaborn: Statistical plotting with beautiful aesthetics
- Altair: Grammar of graphics approach
🤝 Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
Inspired by klib and ydata-profiling, but built from the ground up for Polars performance and modern data science workflows.
🔬 Ready to X-ray your data? Install Polarscope today!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polarscope-1.0.0.tar.gz.
File metadata
- Download URL: polarscope-1.0.0.tar.gz
- Upload date:
- Size: 34.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8be810a7f081121c43fa68ccaacc22872365b4afac053b128005012966f0cf51
|
|
| MD5 |
1123667e592e4299bd70c006f3994b0e
|
|
| BLAKE2b-256 |
8467ac9972632f8311b57cfb76a98957252e2a03b025e5f2c26214ba28237a12
|
File details
Details for the file polarscope-1.0.0-py3-none-any.whl.
File metadata
- Download URL: polarscope-1.0.0-py3-none-any.whl
- Upload date:
- Size: 35.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ae324df5ebf1c3efee5d79dbc3b16773cadd3a488cf00f4efa0066fee90a53e
|
|
| MD5 |
538702da1038466bed0f64f065443424
|
|
| BLAKE2b-256 |
c89fee668daeceb9eee40d61169f5a45b94d9b43e2a6f0d9976d9b19d810787e
|