Skip to main content

A Python Library for Statistical Data Visualization

Project description

pltstat: A Python Library for Statistical Data Visualization

image

pltstat is a Python library designed to facilitate the visualization of statistical data analysis. This library includes a variety of tools and methods to streamline data exploration, statistical computation, and graphical representation.


Installation

Requirements

Before installing, make sure that you are using Python 3.12.
You can check your Python version by running:

python --version

You can download it from the official Python website.

Installation

To install the pltstat library, simply run the following command:

pip install pltstat

This will install the library along with all the required dependencies as specified in the requirements.txt file.

After installation the package, you can start using pltstat by importing the necessary modules in your Python scripts.


File Descriptions

Python Modules

  • __init__.py

    • Marks the directory as a Python package. This file allows you to import modules from the pltstat package.
  • singlefeat.py

    • Dedicated to the analysis and visualization of single-variable features, including plotting functions such as pie charts, count plots, and histograms.
  • twofeats.py

    • Provides tools for analyzing interactions between two features. Includes functions for creating crosstabs, computing correlations, and visualizing results using violin plots, boxplots, and distribution box plots. These functions also display p-values and other statistical metrics to summarize relationships between the two features.
  • multfeats.py

    • Provides tools for analyzing relationships between multiple features.
      Includes visualization functions for analyzing missing data, comparing distributions, and visualizing dimensionality reductions. Additionally, it provides methods for creating heatmaps that display correlations and p-values, including Spearman's correlation, Mann-Whitney p-values, and Phik correlations.
  • circle.py

    • Contains functions and methods related to circular statistical visualizations, such as radar charts or circular histograms.
  • cm.py

    • Contains custom colormap utilities for visualizations, such as rendering correlation matrices or creating two-colored maps for p-values with a threshold (e.g., alpha).
  • stat_methods.py

    • Includes methods for calculating correlation matrices and related statistical relationships.
  • in_out.py

    • Provides utilities for reading, writing, and preprocessing input and output data files.

Other Files

  • .gitignore

    • Specifies intentionally untracked files to ignore in the repository, such as virtual environments and temporary files.
  • README.md

    • This file provides an overview of the project, including file descriptions and usage instructions.
  • requirements.txt

    • Lists the Python dependencies required to run the library. Install them using:
      pip install -r requirements.txt
      

Getting Started

  1. Clone the repository:

    git clone https://github.com/trojanskehesten/pltstat.git
    
  2. Navigate to the project directory:

    cd pltstat
    
  3. Python Version: This library is compatible with Python 3.12. Ensure you have this version installed before running the project.

  4. R Installation: Ensure that the R language is installed on your system, as the rpy2 library (used in this project) requires it.

  5. Install dependencies:

    pip install -r requirements.txt
    
  6. Explore the modules and utilize the library in your projects.


Usage

Each module in pltstat is designed to be modular and reusable. Import the required module and use its functions to visualize your statistical data.

Example 1: Pie Chart

import pandas as pd
from pltstat import singlefeat as sf

data = {
    "Age": [25, 30, 22, 27, 35],
    "A/B Test Group": ["A", "B", "A", "B", "A"],
}
df = pd.DataFrame(data)

# Plot a pie chart
sf.pie(df["A/B Test Group"])

Result 1
Pie plot example

Example 2: Boxplot

import pandas as pd
from pltstat import twofeats as tf

# Data creation:
data = {
    "gender": ["male", "female", "female", "male", "male", "female", "female", "male", "male", 
               "female", "male", "female", "male", "male", "female", "male", "female", "male", 
               "female", "male", "female", "male", "female", "male", "female", "male", "female", 
               "female", "male", "male", "male", "male"],
    "age": [22, 20, 17, 16, 19, 17, 11, 29, 24, 12, 22, 20, 19, 16, 11, 29, 24, 20, 16, 22, 
            17, 29, 24, 16, 17, 29, 22, 19, 22, 22, 24, 29]
}

df = pd.DataFrame(data)

# Boxplot creation:
tf.boxplot(df, "gender", "age")

Result 2
Distribution boxplot example

Example 3: Boxplot and Distribution Plot

import numpy as np
import pandas as pd
from pltstat import twofeats as tf

# Example DataFrame
np.random.seed(42)
df = pd.DataFrame({
    'category': np.random.choice(['A', 'B'], size=100),
    'value': np.random.randn(100),
})

# Create a boxplot and a distribution plot
tf.dis_box_plot(df, cat_feat='category', num_feat='value')

Result 3
Distribution boxplot example


Contributing

Contributions are welcome! If you'd like to improve the library or fix issues, please:

  1. Fork the repository.
  2. Create a new branch.
  3. Make your changes and commit them.
  4. Submit a pull request.

License

This project is licensed under the BSD 3-Clause License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pltstat-0.10.0.tar.gz (34.2 kB view details)

Uploaded Source

File details

Details for the file pltstat-0.10.0.tar.gz.

File metadata

  • Download URL: pltstat-0.10.0.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for pltstat-0.10.0.tar.gz
Algorithm Hash digest
SHA256 3815e6343e550bf30c0424b54306779d4996d1f5ee7245cdc8a8b7c41a9c37b1
MD5 6d5eb45a55f7a3d21f31fcf25522f95b
BLAKE2b-256 255941066f74b921280fe8224ad4fc63db16c4ab5269e229b6c085ad66198f39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page