A package implementing a Principal Components Analysis (PCA)-based approach to identify biological pathways driving inter-tumour heterogeneity in gene regulatory networks.

These details have not been verified by PyPI

Project links

homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PORCUPINE

Principal Components Analysis to Obtain Regulatory Contributions Using Pathway-based Interpretation of Network Estimates is an R package to identify biological pathway which drive inter-tumour heterogeneity in a population of gene regulatory networks. It is a Principal Components Analysis (PCA)-based approach that can be used to determine whether a specific set of variables—for example a set of genes in a specific pathway—have coordinated variability in their regulation.

Method

PORCUPINE uses as input individual patient networks, for example networks modeled using PANDA and LIONESS, as well as a .gmt file that includes biological pathways and the genes belonging to them. For each pathway, it extracts all edges connected to the genes belonging to that pathway and scales each edge across individuals. It then performs a PCA analysis on these edge weights, as well as on a null background that is based on random pathways. To identify significant pathways, PORCUPINE applies a one-tailed t-test and calculates the effect size (ES). Here we provide an example of how we analyzed heterogeneity among single gene regulatory asmple networks in Leiomyosarcomas (LMS). Networks were obtained with PANDA and LIONESS algorithms. Our dataset contains data for 80 TCGA LMS patient-specific gene regulatory networks.

Setup

The requirements are provided in a requirements.txt file. Install package

pip install netzoo_porcupine

Usage

import netzoo_porcupine as pcp
import statsmodels.stats.multitest as smm

First, we load the network data (80_tcga_lms_net.RData) and edges information (rand_genes.RData). Data files are available here : https://zenodo.org/records/8105729. In this example, we have patient-specific gene regulatory networks for 80 TCGA leiomyosarcoma patients. The first three columns in the data provide information on the regulators (TFs) and target genes. Edges information includes three columns: reg (the transcription factor's gene symbol),tar (Ensembl ID), prior (whether an edge is prior (1) or not (0)).

net_file_path = "/div/pythagoras/u1/rosaliec/80_tcga_lms_net.RData"
net = pcp.load_net_file(net_file_path)
print(net.head())


#   0E244FE2-7C17-4642-A51F-2CCA796D9C70  75435ED8-93E8-45FB-8480-98D8EB2EF8CB  \
#0                                  0.76                                  0.10   
#1                                  0.94                                  1.43   
#2                                  1.09                                  2.78   
#3                                  1.13                                  2.60   
#4                                 -0.71                                 -1.42   
#
#   B6D11678-15A9-4F43-A0A2-225067DCAF1C  B7F5A41E-9559-4329-81F5-1B88A74730B7  \
#0                                 -1.27                                  0.01   
#1                                  0.30                                  0.91   
#2                                  1.01                                  2.13   
#3                                  1.66                                  1.71   
#4                                  0.02                                  0.27   
#
#   04823F53-A12D-4852-8F34-77B9DCBB7DF0  49684C2B-D31C-4B45-A400-3497C3CCEC01  \
#0                                 -7.18                                  2.74   
#1                                 -5.69                                  2.43   
#2                                 -6.09                                  3.23   
#3                                 -6.04                                  2.88   
#4                                  5.31                                 -1.02   
#
#   FFDD7A12-DDEF-4974-8D60-64B7EEAAC994  830DFA6F-A85A-4317-82B2-791FAB998A01  \
#0                                 -0.80                                 -0.31   
#1                                 -1.55                                 -2.56   
#2                                  1.04                                 -2.07   
#...
#3                                  0.13  
#4                                 -0.06 

edges_file_path = "/div/pythagoras/u1/rosaliec/edges.RData"
edges = pcp.load_edges_file(edges_file_path)
print(edges.head())

#    reg   tar  prior
#0  AIRE  A1BG    0.0
#1  ALX1  A1BG    0.0
#2  ALX3  A1BG    0.0
#3  ALX4  A1BG    0.0
#4    AR  A1BG    0.0

print(len(edges['reg'].unique()))
#623

print(len(edges['tar'].unique()))
#17899

Our individual networks are represented by interactions between 623 TFs and 17,899 target genes. Then, we need to load in pathway file (.gmt file) available in the data folder.

pathways_file_path = "/div/pythagoras/u1/rosaliec/c2.cp.reactome.v7.1.symbols.gmt"
pathways = pcp.load_gmt(pathways_file_path)
print(len(pathways))
#1532

We need to filter pathways in order to include only pathways with genes that are present in our network file.

pathways_filt=pcp.filter_pathways(pathways,edges)
print(len(pathways_filt))
#1531

Then, we filter pathways based on their size, and all pathways with less than 10 and more than 150 genes are filtered out.

pathways_to_use=pcp.filter_pathways_size(pathways_filt,5,150)
print(len(pathways_to_use))
#1411

We select the top 10 pathways for the analysis (just to reduce computational time).

pathways_to_use=pathways_to_use[0:10]

Next, we perform PCA analysis for the 10 selected pathawys and extract the information on the variance explained by the first principal component in each pathway.

pca_res_pathways=pcp.pca_pathway(pathways_to_use, net, edges,scale_data = True, center_data = True)
print(pca_res_pathways)
#                                             pathway        pc1  n_edges  \
#0         REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS  16.787642     8099   
#1                     REACTOME_PYRIMIDINE_CATABOLISM  23.986634     7476   
#2  REACTOME_INHIBITION_OF_THE_PROTEOLYTIC_ACTIVIT...  18.545908    12460   
#3  REACTOME_PYRUVATE_METABOLISM_AND_CITRIC_ACID_T...  20.999699    32396   
#4   REACTOME_APOPTOTIC_CLEAVAGE_OF_CELLULAR_PROTEINS  16.808329    21182   
#5  REACTOME_ACTIVATION_OF_THE_PRE_REPLICATIVE_COM...  20.919336    20559   
#6        REACTOME_PROCESSING_OF_INTRONLESS_PRE_MRNAS  18.129761    10591   
#7                  REACTOME_GAP_JUNCTION_DEGRADATION  37.223203     7476   
#8                      REACTOME_BASE_EXCISION_REPAIR  14.640845    27412   
#9        REACTOME_BILE_ACID_AND_BILE_SALT_METABOLISM  21.326899    26789   
#
#   pathway_size  
#0            13  
#1            12  
#2            20  
#3            52  
#4            34  
#5            33  
#6            17  
#7            12  
#8            44  
#9            43

Then we perform a PCA analysis based on random gene sets. In this case we create 50 random gene sets for each pathway and run PCA for each gene set. This is just an example, for real comparisons we recommend to set n_perm to 1000.

pca_res_random = pcp.pca_random(net, edges, pca_res_pathways, pathways_filt, n_perm = 50, scale_data= True, center_data = True)

Then to identify significant pathways we run PORCUPINE, which compares the observed PCA score for a pathway to a set of PCA scores of random gene sets of the same size as pathway. Calculates p-value and effect size. The p-values can differ because of the random nature of this test.

res_porcupine = pcp.porcupine(pca_res_pathways,pca_res_random)
#P_adjust calculation using FDR method (Benjamini-Hochberg)
res_porcupine['p.adjust'] = smm.multipletests(res_porcupine['pval'], method='fdr_bh')[1]
print(res_porcupine)

#Pathways with size 13
#Pathways with size 12
#Pathways with size 20
#Pathways with size 52
#Pathways with size 34
#Pathways with size 33
#Pathways with size 17
#Pathways with size 44
#Pathways with size 43
#                                             pathway  pathway_size        pc1  \
#0         REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS            13  16.787642   
#1                     REACTOME_PYRIMIDINE_CATABOLISM            12  23.986634   
#2                  REACTOME_GAP_JUNCTION_DEGRADATION            12  37.223203   
#3  REACTOME_INHIBITION_OF_THE_PROTEOLYTIC_ACTIVIT...            20  18.545908   
#4  REACTOME_PYRUVATE_METABOLISM_AND_CITRIC_ACID_T...            52  20.999699   
#5   REACTOME_APOPTOTIC_CLEAVAGE_OF_CELLULAR_PROTEINS            34  16.808329   
#6  REACTOME_ACTIVATION_OF_THE_PRE_REPLICATIVE_COM...            33  20.919336   
#7        REACTOME_PROCESSING_OF_INTRONLESS_PRE_MRNAS            17  18.129761   
#8                      REACTOME_BASE_EXCISION_REPAIR            44  14.640845   
#9        REACTOME_BILE_ACID_AND_BILE_SALT_METABOLISM            43  21.326899   
#
#           pval        es      p.adjust  
#0  1.000000e+00  0.963420  1.000000e+00  
#1  2.007815e-05  0.638230  5.019538e-05  
#2  9.465539e-36  4.740924  9.465539e-35  
#3  9.997105e-01  0.520523  1.000000e+00  
#4  3.203126e-07  0.808265  1.601563e-06  
#5  1.000000e+00  1.275528  1.000000e+00  
#6  6.475769e-03  0.364771  1.295154e-02  
#7  9.998534e-01  0.551486  1.000000e+00  
#8  1.000000e+00  1.724658  1.000000e+00  
#9  2.898786e-06  0.718975  9.662619e-06

Significant pathways can be selected based on adjusted p-value, explained variance and effect size.

To obtain pathway-based patient heterogeneity scores on the first two principal components.

ind_res = pcp.get_pathway_ind_scores(pathways_to_use, net,edges, scale_data = True, center_data = True)
print(ind_res[0:10])
#                                                                         pathway  \
#0E244FE2-7C17-4642-A51F-2CCA796D9C70  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#75435ED8-93E8-45FB-8480-98D8EB2EF8CB  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#B6D11678-15A9-4F43-A0A2-225067DCAF1C  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#B7F5A41E-9559-4329-81F5-1B88A74730B7  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#04823F53-A12D-4852-8F34-77B9DCBB7DF0  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#49684C2B-D31C-4B45-A400-3497C3CCEC01  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#FFDD7A12-DDEF-4974-8D60-64B7EEAAC994  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#830DFA6F-A85A-4317-82B2-791FAB998A01  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#58578614-E4A3-4655-BBAB-F65851625E0A  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#4139E0C9-F712-4A25-8B59-587533B93B3E  REACTOME_GLYCOGEN_BREAKDOWN_GLYCOGENOLYSIS   
#
#                                          Dim.1      Dim.2  
#0E244FE2-7C17-4642-A51F-2CCA796D9C70  -0.124027  -7.807079  
#75435ED8-93E8-45FB-8480-98D8EB2EF8CB  20.641956  -6.844048  
#B6D11678-15A9-4F43-A0A2-225067DCAF1C   6.240503  -8.771506  
#B7F5A41E-9559-4329-81F5-1B88A74730B7  24.644962  -0.103552  
#04823F53-A12D-4852-8F34-77B9DCBB7DF0  -8.556204  -4.645454  
#49684C2B-D31C-4B45-A400-3497C3CCEC01   2.916440 -21.810588  
#FFDD7A12-DDEF-4974-8D60-64B7EEAAC994  12.886658 -15.456415  
#830DFA6F-A85A-4317-82B2-791FAB998A01  39.929420   5.942546  
#58578614-E4A3-4655-BBAB-F65851625E0A -10.705200 -19.711465  
#4139E0C9-F712-4A25-8B59-587533B93B3E -95.639682  60.061360

PORCUPINE allows to identify patient subtypes based on gene regulatory networks for each of the significant pathways. For this, K-means clustering is applied to the pathway-based patient heterogeneity scores on the first two principal components. The optimal number of clusters can be determined prior to clustering using the Average Silhouette Method.

Here we provide example of stratifying patients based on the “REACTOME VITAMIN B5 PANTOTHENATE METABOLISM” pathway.

print(pathways[37])
#('REACTOME_VITAMIN_B5_PANTOTHENATE_METABOLISM', ['COASY', 'VNN1', 'VNN2', 'PANK3', 'PDZD11', 'SLC25A16', 'PANK2', 'PPCS', 'ENPP2', 'SLC5A6', 'PPCDC', 'AASDHPPT', 'PANK1', 'ENPP3', 'PANK4', 'FASN', 'ENPP1'])

pcp.select_number_clusters(pathways[37],net, edges)
#The optimal number of clusters is 2. To visualize clusters
pcp.visualize_clusters(pathways[37], net, edges, number_of_clusters = 2)

Project details

These details have not been verified by PyPI

Project links

homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.3

Jul 31, 2024

1.0.2

Jul 31, 2024

1.0.1

Jul 24, 2024

1.0.0

Jul 24, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netzoo_porcupine-1.0.3.tar.gz (17.8 kB view details)

Uploaded Jul 31, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

netzoo_porcupine-1.0.3-py3-none-any.whl (15.7 kB view details)

Uploaded Jul 31, 2024 Python 3

File details

Details for the file netzoo_porcupine-1.0.3.tar.gz.

File metadata

Download URL: netzoo_porcupine-1.0.3.tar.gz
Upload date: Jul 31, 2024
Size: 17.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for netzoo_porcupine-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`8920b2d0e2eb52d00978dc31a627d0d458b99388ccf99a0b335a2d7c23b1affc`
MD5	`4ef817bd408fbff5770d1bc1c9936222`
BLAKE2b-256	`20a48d4d9818c7009e6b6704e6377d65493d7daa3bf4999c32ff8b49269bd2be`

See more details on using hashes here.

File details

Details for the file netzoo_porcupine-1.0.3-py3-none-any.whl.

File metadata

Download URL: netzoo_porcupine-1.0.3-py3-none-any.whl
Upload date: Jul 31, 2024
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for netzoo_porcupine-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b2a06808369139f7be618ebe8cf991927fd7a2bd2b5694e86ebe35734b9ddde6`
MD5	`6f26936c783c473422c945482765feea`
BLAKE2b-256	`2f23e45018aaf1f8a8cb9cfff6593ab4813d355c08668eb5eb7ae9934ec62fb8`

See more details on using hashes here.

netzoo-porcupine 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PORCUPINE

Method

Setup

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes