Package for standardization and evaluation of biological networks
Project description
Network Evaluation Tools 2 is a Python 3.10 package accompanying the manuscript: State of the Interactomes: an evaluation of molecular networks for generating biological insights. Sarah N Wright et al. biorxiv.org (2024).
This repository contains all python code used in the study, as well as example usage on a small network example. This package updates and expands the work developed as part of Huang and Carlin et al. 2018.
Installation
To install the package, run the following command:
pip install neteval
To install the package from source, clone the repository and run the following command:
git clone https://github.com/sarah-n-wright/Network_Evaluation_Tools cd Network_Evaluation_Tools make dist pip install dist/netevalcmd*whl
For example usage of command line scripts see Example Usage. For example usage of all other funcitonality see State of the Interactomes Notebooks.
Optional External Packages
Interaction Prediction algorithms
L3: The L3 algorithm can be installed from https://github.com/kpisti/L3. The path to the executable can then be used with edge_prediction.py.
MPS(T): The MPS(T) algorithm can be installed from https://github.com/spxuw/PPI-Prediction-Project, and should be installed and run from a separate environment.
AlphaFold Multimer
AlphaFold-Multimer can be installed from https://github.com/YoshitakaMo/localcolabfold, and should be installed and run from a separate environment.
Dependencies
Modules in this package
Interactome processing, standardization and annotation
Modules:
processing_functions: Module utilized by process_data, contains the NetworkData class.__
gene_mapper: Includes query_hgnc, query_ensembl, query_uniprot. Modules to map identifiers to NCBI Gene IDs from a variety of identifier types including: Uniprot, Ensembl, EnsemblProtein, RefSeq, Symbol. Incorporates methods to handle out of date identifiers.
node_annotation: Module for downloading and extracting gene annotation data from HGNC, NCBI, Uniprot and Ensembl. Includes the class ExpressionData for loading and analyzing mRNA and protein expression data.
network_statistics: Module to extract summary statistics for a set of network, such as node and edge counts.
gsea_functions: Module for performing downloading, processing and analyzing Gene Ontology data.
Script(s):
process_data.py: Python script to process a raw interactome based on a configuration file
For detailed usage see ExampleUsage/NP_NetworkProcessing.README.md
Gene set recovery performance evaluation
Evaluation data collection and processing
Modules:
get_disgen_associations: Module to download DisGeNET associations and generate genesets
get_gwas_associations: Module to download GWAS Catalog associations and generate genesets
Script(s):
prepare_evaluation_data.py: Python script to convert gene identifiers within gene sets and filter based on network coverage.
For detailed usage, see ExampleUsage/GSR_GeneSetRecovery.README.md
Gene set recovery
Modules:
network_evaluation_functions: Module for performing and evaluating gene set recovery.
network_propagation: Underlying network propagation methodology.
shuffle_networks: Module for creating degree-matched shuffled networks
gene_set_recovery_results: Module to load, evaluate, and visualize gene set recovery results. Includes the class EvaluationResults.
Script(s):
run_network_evaluation.py: Python script to perform gene set recovery performance evaluation
For detailed usage, see ExampleUsage/GSR_GeneSetRecovery.README.md
Parsimonious Composite Networks (PCNets)
Script(s):
network_constructor: Python script to create composite networks using the global composite and ranked composite approaches. See ExampleUsage/run_composite.sh.
For detailed usage see ExampleUsage/PC_PCNets.README.md
Interaction & complex prediction
Modules:
community_annotation: Module for assessing the quality of gene communities in a network.
edge_prediction: Module for performing and analyzing edge prediction results.
Script(s):
edge_prediction.py: Script for performing edge prediciton evaluation.
alphafold_results.py: Script for parsing and analyzing AlphaFold results.
complex_evaluation.py: Script for evaluating hierarchical complex prediction results.
For detailed usage see ExampleUsage/IP_InteractionPrediction.README.md and ExampleUsage/AF_AlphaFold.README.md
General utilities
data_import_export_tools: Module of functions for importing and exporting the various data formats used by this package.
Timer: Class that measures the elapsed time of various processing steps and outputs a summary.
Provided Data and Implementation Examples
ExampleUsage
This directory contains README and bash scripts for implemenation of each stage of the network evaluation pipeline. All examples utilize three small interactomes (DIP, PID2, and Wan). While most of the pipeline is designed to run in a high-performance computing environment, most of these examples can be run on a local machine.
NP_NetworkProcessing.README.md
GSR_GeneSetRecovery.README.md
PC_PCNets.README.md
IP_InteractionPrediction.README.md
AF_AlphaFold.README.md
Data
This directory contains key data sets used for the evaluation of interactomes for prosperity, including:
Annotation data from HGNC, Ensembl, NCBIm, and Uniprot
Gene sets analyzed
Gene conservation scores
Example networks (Wan, DIP, PID2)
CORUM and PANTHER edge lists
StateOfTheInteractomes_Notebooks
This directory contains code and guidelines for reproducing data and figures contained in the manuscript.
Notebooks
1_Statistics_and_Representation.ipynb
2_GO_analysis.ipynb
3_Gene_Set_Recovery.ipynb
4_Composite_networks.ipynb
5_Interaction_and_Complex_Prediction.ipynb
6_AlphaFold_Assessment.ipynb
Due to the computational requirements of the underlying analyses, these notebooks leverage pre-computed data and example implementations with small networks. Much of the State of the Interactomes pipeline is designed to run in a high-performance computing environment. Please see ExampleUsage for guidelines on implementing each stage of the pipeline.
To run the State Of the Interactomes Notebooks, install the required dependencies:
pip install -r requirements_stateoftheinteractomes.txt
Inputs/Outputs
StateOfTheInteractomes_Notebooks/Data/ contains pre-computed data for visualization
Other data neccessary for analysis is contained in Data/
Generated figures are saved to StateOfTheInteractomes_Notebooks/Figures
Generated data is saved to Data/example_outputs/
StateOfTheInteractomes_Notebooks/Data
This directory contains data necessary for recreating the manuscript figures, including Supplemental Tables 2-5,7-8, and other precomputed results.
StateOfTheInteractomes_Notebooks/Supplemental_Code
This directory contains code used in the generation of the manuscript results that is not included in the primary neteval package. This includes implementation of EGAD (Extending Guilt by Association by Degree) and HiDeF (Hierarchical community Decoding Framework), as well as processing of PDB files and gene conservation scores.
To run Supplemental Code, see additional dependencies in the associated README files:
Gene Function Prediction by GBA (EGAD_README.md)
Processing of Gene Conservation Scores (phyloP_README.md)
Compatibility
Python 3.10+
Citing neteval
If you use neteval in your research, please cite the following publication:
Wright, SN., et al. State of The Interactomes: an evaluation of molecular networks for generating biological insights.
Credits
This package is built from the original Network Evaluation Tools developed by Huang and Carlin et al. 2018.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.2.2 (2024-11-07)
Final versions of notebooks associated with State of the Interactomes manuscript
Fixed redundant python requirement in setup.py
Updated documentation
0.2.1 (2024-10-07)
Updated documentation
0.2.0 (2024-10-03)
First release on PyPI.
0.1.0 (2023-03-22)
Pre-release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file neteval-0.2.2.tar.gz
.
File metadata
- Download URL: neteval-0.2.2.tar.gz
- Upload date:
- Size: 88.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d60135fa6295b07148f848aba2937e7ea68ac010ba620796c9612a8e7138a26b |
|
MD5 | f4d7b7665746b13f8cdcb5627ea1f0c0 |
|
BLAKE2b-256 | 4e757c0621bb328fd0b196d4263faa820c9ac980e450bfbc484d0b6ff53e1197 |
File details
Details for the file neteval-0.2.2-py2.py3-none-any.whl
.
File metadata
- Download URL: neteval-0.2.2-py2.py3-none-any.whl
- Upload date:
- Size: 116.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bbe1168bef7699318ae42c9ae4ee91e895bf81696fd2cc9e66003355b8c2cdb |
|
MD5 | 3a597d03a6204b3415c39b15d0de9fa5 |
|
BLAKE2b-256 | 5c3fa19605071f1eae4123c7342cb2d053d2c4009f3ca464ed5b2d20b0490fec |