Skip to main content

An interactive data profiling library for Python notebooks with rich HTML reports and PDF export capabilities

Project description

Pytics

An interactive data profiling library for Python that generates comprehensive HTML reports with rich visualizations and PDF export capabilities.

GitHub release (latest by date) Python Version License

Features

  • 📊 Interactive Visualizations: Built with Plotly for dynamic, interactive charts
  • 📱 Responsive Design: Reports adapt to different screen sizes
  • 📄 PDF Export: Generate publication-ready PDF reports
  • 🎯 Target Analysis: Special insights for classification/regression tasks
  • 🔍 Comprehensive Profiling: Detailed statistics and distributions
  • Performance Optimized: Efficient handling of large datasets
  • 🛠️ Customizable: Configure sections and visualization options

Example Reports

Full Profile Report

Full Profile Report

Targeted Analysis Report

Targeted Analysis Report

Installation

pip install pytics

Quick Start

import pandas as pd
from pytics import profile

# Load your dataset
df = pd.read_csv('your_data.csv')

# Generate an HTML report
profile(df, output_file='report.html')

# Generate a PDF report
profile(df, output_format='pdf', output_file='report.pdf')

# Profile with a target variable
profile(df, target='target_column', output_file='report.html')

# Select specific sections
profile(
    df,
    include_sections=['overview', 'correlations'],
    output_file='report.html'
)

Report Sections

  1. Overview

    • Dataset summary
    • Memory usage
    • Data types distribution
    • Missing values summary
  2. Variable Analysis

    • Detailed statistics
    • Distribution plots
    • Missing value patterns
    • Unique values analysis
  3. Correlations

    • Correlation matrix
    • Feature relationships
    • Interactive heatmaps
  4. Target Analysis (when target specified)

    • Target distribution
    • Feature importance
    • Target correlations

Configuration Options

profile(
    df,
    target='target_column',           # Target variable for supervised learning
    include_sections=['overview'],    # Sections to include
    exclude_sections=['correlations'],# Sections to exclude
    output_format='pdf',             # 'html' or 'pdf'
    output_file='report.html',       # Output file path
    theme='light',                   # Report theme
    title='Custom Report Title'      # Report title
)

Edge Cases and Limitations

Data Size Limits

  • Recommended maximum rows: 1 million
  • Recommended maximum columns: 1000
  • Large datasets may require increased memory allocation

Special Cases

  • Missing Values: Automatically handled and reported
  • Categorical Variables: Limited to 1000 unique values by default
  • Date/Time: Automatically detected and analyzed
  • Mixed Data Types: Handled with appropriate warnings

Error Handling

  • Custom exceptions for clear error reporting
  • Warning system for non-critical issues
  • Graceful degradation for memory constraints

Best Practices

  1. Memory Management

    • Sample large datasets if needed
    • Use section selection for focused analysis
    • Monitor memory usage for big datasets
  2. Performance Optimization

    • Limit categorical variables when possible
    • Use targeted section selection
    • Consider data sampling for initial exploration
  3. Report Generation

    • Choose appropriate output format
    • Use meaningful report titles
    • Save reports with descriptive filenames

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytics-1.0.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytics-1.0.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file pytics-1.0.0.tar.gz.

File metadata

  • Download URL: pytics-1.0.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pytics-1.0.0.tar.gz
Algorithm Hash digest
SHA256 23b8b7791912e6725f89393705f1c5d594da8c8749034cdce5d7ad440b189ddd
MD5 5f892bbd87df66f0a0a69cd05741ff8d
BLAKE2b-256 5578bed106295639dd3713e076e84e9b3b35be5dbf80b97dbfdf4838e527aec3

See more details on using hashes here.

File details

Details for the file pytics-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pytics-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 6.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pytics-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f51d6cf559bd5b4b7ec4890e867fa84107e5b86630441d6c17cbb81c8da275f6
MD5 4e3dd5d1fd26ba74ebb7502ae3e1c5ee
BLAKE2b-256 8b6b6bf8f031587536eea9bcbfb3c327ea42e222fe5999d59710daeba1f2232e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page