Skip to main content

Package for ANCOVA analysis and visualization.

Project description

README for ANCOVA Analysis Script

Overview

This script, provides tools for performing ANCOVA (Analysis of Covariance) and related statistical analyses. It includes a primary function, do_ancova, which integrates multiple steps of ANCOVA analysis and allows for flexible customization of inputs and outputs, including graphical representations of results.


Key Functionality: do_ancova

The main purpose of the do_ancova function is to perform parametric or non-parametric ANCOVA on a dataset. It accepts a DataFrame containing the dependent variable, categorical variables, and covariates to evaluate the relationship between them while adjusting for covariates.

Features:

  • Parametric and Non-Parametric ANCOVA:
    Automatically switches between parametric or ranked (non-parametric) ANCOVA depending on the assumptions of normality and homoscedasticity.

  • Interaction Effects:
    Allows inclusion of interactions between variables.

  • Post-Hoc Analysis:
    Automatically performs Tukey or Dunn post-hoc tests when significant differences are found between groups.

  • Data Visualization:
    Generates boxplots and scatterplots with regression lines, including statistical significance indicators.

  • Customizable Options:
    Users can customize interactions, colors, and plot details.


Usage: do_ancova

Parameters:

  • data:
    A pandas DataFrame containing:

    • Column 1: Dependent (response) variable.
    • Column 2 (to n categories): Categorical independent variable(s).
    • Remaining columns: Continuous covariates.
  • interactions (Optional):
    Specifies interactions between variables:

    • "ALL": Includes all interactions.
    • list: List of tuples specifying interacting variables.
  • plot (Default: False):
    If True, generates a regression plot and a boxplot.

  • save_plot (Default: False):
    If provided with a file path, saves the generated plots to the specified location.

  • covariate_to_plot (Optional):
    Specifies the covariate to display in plots.

  • palette (Optional):
    A dictionary mapping categorical levels to colors.

  • categories (Default: 1):
    Number of categorical variables.

  • ax (Optional):
    A Matplotlib axis for custom plotting.

  • y_lab (Optional): Label for the y-axis in the generated plot. Default is False (no label).

  • x_lab (Optional): Label for the x-axis in the generated plot. Default is False (no label).

  • sum_of_squares_type (Optional): Specifies the type of sums of squares for ANCOVA. Default is Type 2 (value = 2).

    Output:

  1. Results:

    • A summary data frame with the ANCOVA parameters and outcomes.
    • An ANCOVA table with p-values for each effect.
    • Post-hoc results (if applicable).
  2. Plots:

    • Scatterplot with regression lines for covariates + Boxplot for main categorical copmpaisons.
    • A Matplotlib axis with a Boxplot for categorical comparisons (allows customizing).
  3. Files (Optional):
    Saves plots to the specified file path if save_plot is provided.

Dependencies

The script relies on the following Python packages:

  • numpy
  • pandas
  • statsmodels
  • scipy
  • seaborn
  • matplotlib
  • scikit_posthocs

Install these dependencies using:

pip install numpy pandas statsmodels scipy seaborn matplotlib scikit-posthocs

Notes

  • Ensure that your dataset has the shape: Cases*Variables.
  • The script assumes the columns are sorted like this: [Response variable, Main category to compare, Other categorical co-variables (optional), Other continous co-variables].
  • For multiple categorical variables, specify the number using the categories parameter.

AN EXAMPLE OF USE:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Charge the main function from our package
from Ancova_analysis import do_ancova

This invented dataset contains 150 entries with the following columns:

  • Number of T Cells: The number of T cells, which is affected by the individual's age and HIV status. Individuals with HIV+ (Untreated) have a significant reduction in T cells, while HIV+ (TAR Treatment) individuals have a minimal reduction compared to HIV- individuals.

  • HIV Status: A categorical variable representing the individual's HIV status. It can take three values:

      -> HIV- (no HIV)
    
      -> HIV+ (TAR Treatment) (HIV positive, receiving treatment)
    
      -> HIV+ (Untreated) (HIV positive, not receiving treatment)
    
  • Sex: The individual's sex, either Male or Female.

  • Age: The individual's age, ranging from 20 to 70 years.

The Number of T Cells decreases with age, and the reduction is more significant for individuals with HIV+ (Untreated).

# Set the seed for reproducibility
np.random.seed(4)

# Number of samples
n = 150

# Categorical variables
sex = np.random.choice(['Male', 'Female'], size=n)
hiv_status = np.random.choice(['HIV-', 'HIV+ (TAR Treatment)', 'HIV+ (Untreated)'], size=n, p=[0.4, 0.3, 0.3])

# Covariate: Age
age = np.random.randint(20, 70, size=n)

# Generate T cell count
t_cells = []
for i in range(n):
    base_t_cells = 1000  # General base for T cells
    age_effect = -3 * (age[i] - 30)  # Mild effect of age
    if hiv_status[i] == 'HIV+ (Untreated)':
        hiv_effect = -200  # Significant reduction for untreated
    elif hiv_status[i] == 'HIV+ (TAR Treatment)':
        hiv_effect = -30  # Minimal reduction for treated
    else:
        hiv_effect = 0  # No effect for HIV-
    noise = np.random.normal(0, 50)  # Random noise
    t_cells.append(base_t_cells + age_effect + hiv_effect + noise)

# Define a palette to select the plotting colors for each category, else it would be randomly assigned
palette = {"HIV-":"skyblue",
           "HIV+ (Untreated)":"salmon",
           "HIV+ (TAR Treatment)":"orange"}


# Create the DataFrame
data_hiv = pd.DataFrame({
    'Number of T Cells': np.round(t_cells).astype(int),
    'HIV Status': hiv_status,
    'Sex': sex,
    'Age': age
})

data_hiv.head()

Lets see if the ANCOVA analysis is able to capture this differences:

# Run the main function and display the results

df_results, ancova_summary,post_hoc = do_ancova(data=data_hiv,
                                                palette=palette,
                                                categories=2, # HIV Status and Sex
                                                interactions=[('HIV Status',"Age")], # Test the significance of the interaction of these variables
                                                y_lab="CD4 T Cells (count)",# Set the y_label 
                                                plot=True, # Create the plot
                                                save_plot= "./Images/ANCOVA_Regression_boxplot.png" # Sves the plot in that path
                                                ) 

display(df_results)
display(ancova_summary)
display(post_hoc)

Example Plot

# Create two subplots in a row
fig, axs = plt.subplots(ncols=2,figsize=(12,6))


df_results, ancova_summary,post_hoc,ax= do_ancova(data=data_hiv,palette=palette,categories=2, y_lab="CD4 T Cells (count)",plot=True,
          ax=axs[0] # When the axis is provided it returns the boxplot and can be integrated with other subplots as you wish
          )

# Modify the df order to plot the sex differences
data_hiv_sex = data_hiv[['Number of T Cells','Sex','HIV Status','Age']]

df_results, ancova_summary,post_hoc,ax= do_ancova(data=data_hiv_sex,categories=2, y_lab="CD4 T Cells (count)",plot=True,
          ax=axs[1], # The other subplot

          )
# Save and show
plt.savefig("./Images/ANCOVA_two_boxplots.png",bbox_inches="tight")
plt.show()

Example Plot 2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ancova-0.1.2.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ANCOVA-0.1.2-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file ancova-0.1.2.tar.gz.

File metadata

  • Download URL: ancova-0.1.2.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for ancova-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ab0dd632d256e99c90d8718f84908c8f8e5516d6fcfe95960be478510c7a9bbb
MD5 2dfc3c5edc0c36cf7fb73bc807ca339c
BLAKE2b-256 9b0e134c1f46c139dcd0cc61475c487b667dd411280ac98dd3063b425da35d74

See more details on using hashes here.

File details

Details for the file ANCOVA-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ANCOVA-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for ANCOVA-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2756e59e04a3520256c875cf60df2c94b657457860d43f421ae6be006647d5f6
MD5 a63cb53d06cd7190926dd28d8dd0dfe3
BLAKE2b-256 4e4488c66f4c0fcf8fa6a5d32eb7c6338d0f7776a5986bfcd6e39f2525b90870

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page