Package for ANCOVA analysis and visualization.
Project description
README for ANCOVA Analysis Script
Overview
This script, provides tools for performing ANCOVA (Analysis of Covariance) and related statistical analyses. It includes a primary function, do_ancova, which integrates multiple steps of ANCOVA analysis and allows for flexible customization of inputs and outputs, including graphical representations of results.
Key Functionality: do_ancova
The main purpose of the do_ancova function is to perform parametric or non-parametric ANCOVA on a dataset. It accepts a DataFrame containing the dependent variable, categorical variables, and covariates to evaluate the relationship between them while adjusting for covariates.
Features:
-
Parametric and Non-Parametric ANCOVA:
Automatically switches between parametric or ranked (non-parametric) ANCOVA depending on the assumptions of normality and homoscedasticity. -
Interaction Effects:
Allows inclusion of interactions between variables. -
Post-Hoc Analysis:
Automatically performs Tukey or Dunn post-hoc tests when significant differences are found between groups. -
Data Visualization:
Generates boxplots and scatterplots with regression lines, including statistical significance indicators. -
Customizable Options:
Users can customize interactions, colors, and plot details.
Usage: do_ancova
Parameters:
-
data:
A pandas DataFrame containing:- Column 1: Dependent (response) variable.
- Column 2 (to n categories): Categorical independent variable(s).
- Remaining columns: Continuous covariates.
-
interactions(Optional):
Specifies interactions between variables:"ALL": Includes all interactions.list: List of tuples specifying interacting variables.
-
plot(Default: False):
IfTrue, generates a regression plot and a boxplot. -
save_plot(Default: False):
If provided with a file path, saves the generated plots to the specified location. -
covariate_to_plot(Optional):
Specifies the covariate to display in plots. -
palette(Optional):
A dictionary mapping categorical levels to colors. -
categories(Default: 1):
Number of categorical variables. -
ax(Optional):
A Matplotlib axis for custom plotting. -
y_lab(Optional): Label for the y-axis in the generated plot. Default is False (no label). -
x_lab(Optional): Label for the x-axis in the generated plot. Default is False (no label). -
sum_of_squares_type(Optional): Specifies the type of sums of squares for ANCOVA. Default is Type 2 (value = 2).Output:
-
Results:
- A summary data frame with the ANCOVA parameters and outcomes.
- An ANCOVA table with p-values for each effect.
- Post-hoc results (if applicable).
-
Plots:
- Scatterplot with regression lines for covariates + Boxplot for main categorical copmpaisons.
- A Matplotlib axis with a Boxplot for categorical comparisons (allows customizing).
-
Files (Optional):
Saves plots to the specified file path ifsave_plotis provided.
Dependencies
The script relies on the following Python packages:
numpypandasstatsmodelsscipyseabornmatplotlibscikit_posthocs
Install these dependencies using:
pip install numpy pandas statsmodels scipy seaborn matplotlib scikit-posthocs
Notes
- Ensure that your dataset has the shape: Cases*Variables.
- The script assumes the columns are sorted like this: [Response variable, Main category to compare, Other categorical co-variables (optional), Other continous co-variables].
- For multiple categorical variables, specify the number using the categories parameter.
AN EXAMPLE OF USE:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Charge the main function from our package
from Ancova_analysis import do_ancova
This invented dataset contains 150 entries with the following columns:
-
Number of T Cells: The number of T cells, which is affected by the individual's age and HIV status. Individuals with HIV+ (Untreated) have a significant reduction in T cells, while HIV+ (TAR Treatment) individuals have a minimal reduction compared to HIV- individuals.
-
HIV Status: A categorical variable representing the individual's HIV status. It can take three values:
-> HIV- (no HIV) -> HIV+ (TAR Treatment) (HIV positive, receiving treatment) -> HIV+ (Untreated) (HIV positive, not receiving treatment) -
Sex: The individual's sex, either Male or Female.
-
Age: The individual's age, ranging from 20 to 70 years.
The Number of T Cells decreases with age, and the reduction is more significant for individuals with HIV+ (Untreated).
# Set the seed for reproducibility
np.random.seed(4)
# Number of samples
n = 150
# Categorical variables
sex = np.random.choice(['Male', 'Female'], size=n)
hiv_status = np.random.choice(['HIV-', 'HIV+ (TAR Treatment)', 'HIV+ (Untreated)'], size=n, p=[0.4, 0.3, 0.3])
# Covariate: Age
age = np.random.randint(20, 70, size=n)
# Generate T cell count
t_cells = []
for i in range(n):
base_t_cells = 1000 # General base for T cells
age_effect = -3 * (age[i] - 30) # Mild effect of age
if hiv_status[i] == 'HIV+ (Untreated)':
hiv_effect = -200 # Significant reduction for untreated
elif hiv_status[i] == 'HIV+ (TAR Treatment)':
hiv_effect = -30 # Minimal reduction for treated
else:
hiv_effect = 0 # No effect for HIV-
noise = np.random.normal(0, 50) # Random noise
t_cells.append(base_t_cells + age_effect + hiv_effect + noise)
# Define a palette to select the plotting colors for each category, else it would be randomly assigned
palette = {"HIV-":"skyblue",
"HIV+ (Untreated)":"salmon",
"HIV+ (TAR Treatment)":"orange"}
# Create the DataFrame
data_hiv = pd.DataFrame({
'Number of T Cells': np.round(t_cells).astype(int),
'HIV Status': hiv_status,
'Sex': sex,
'Age': age
})
data_hiv.head()
Lets see if the ANCOVA analysis is able to capture this differences:
# Run the main function and display the results
df_results, ancova_summary,post_hoc = do_ancova(data=data_hiv,
palette=palette,
categories=2, # HIV Status and Sex
interactions=[('HIV Status',"Age")], # Test the significance of the interaction of these variables
y_lab="CD4 T Cells (count)",# Set the y_label
plot=True, # Create the plot
save_plot= "./Images/ANCOVA_Regression_boxplot.png" # Sves the plot in that path
)
display(df_results)
display(ancova_summary)
display(post_hoc)
# Create two subplots in a row
fig, axs = plt.subplots(ncols=2,figsize=(12,6))
df_results, ancova_summary,post_hoc,ax= do_ancova(data=data_hiv,palette=palette,categories=2, y_lab="CD4 T Cells (count)",plot=True,
ax=axs[0] # When the axis is provided it returns the boxplot and can be integrated with other subplots as you wish
)
# Modify the df order to plot the sex differences
data_hiv_sex = data_hiv[['Number of T Cells','Sex','HIV Status','Age']]
df_results, ancova_summary,post_hoc,ax= do_ancova(data=data_hiv_sex,categories=2, y_lab="CD4 T Cells (count)",plot=True,
ax=axs[1], # The other subplot
)
# Save and show
plt.savefig("./Images/ANCOVA_two_boxplots.png",bbox_inches="tight")
plt.show()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ancova-0.1.2.tar.gz.
File metadata
- Download URL: ancova-0.1.2.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab0dd632d256e99c90d8718f84908c8f8e5516d6fcfe95960be478510c7a9bbb
|
|
| MD5 |
2dfc3c5edc0c36cf7fb73bc807ca339c
|
|
| BLAKE2b-256 |
9b0e134c1f46c139dcd0cc61475c487b667dd411280ac98dd3063b425da35d74
|
File details
Details for the file ANCOVA-0.1.2-py3-none-any.whl.
File metadata
- Download URL: ANCOVA-0.1.2-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2756e59e04a3520256c875cf60df2c94b657457860d43f421ae6be006647d5f6
|
|
| MD5 |
a63cb53d06cd7190926dd28d8dd0dfe3
|
|
| BLAKE2b-256 |
4e4488c66f4c0fcf8fa6a5d32eb7c6338d0f7776a5986bfcd6e39f2525b90870
|