Module for working with protein networks (gene ontology, enrichment, protein-protein interactions, etc.)
Project description
ProteinNetworks
The library contains convenient tools for rapid analysis of gene ontology, enrichment and protein-protein interaction data. Based on the stringdb
library. Some features require you to install R to work (see EnrichmentAnalysis.prioretizingGO()
)
The module will contain 4 sets of tools:
- Enrichment Analysis
- Protein networks Analysis
- Group comparing tools
- Visualization tools
Get Started
pip install -i https://test.pypi.org/simple/ ProteinNetworks==0.1.3
Contents:
-
- module:
ProteinNetworks.STRING_enrichment
-
class:
EnrichmentAnalysis
methods:
EnrichmentAnalysis.create_subframe_by_names()
EnrichmentAnalysis.drop_duplicated_genes()
EnrichmentAnalysis.get_category_terms()
EnrichmentAnalysis.get_enrichment()
EnrichmentAnalysis.get_genes_by_localization()
EnrichmentAnalysis.get_genes_of_term()
EnrichmentAnalysis.get_mapped()
EnrichmentAnalysis.prioretizingGO()
EnrichmentAnalysis.proteins_participation_in_the_category()
EnrichmentAnalysis.save_table()
EnrichmentAnalysis.show_category_terms()
EnrichmentAnalysis.show_enrichest_terms_in_category()
EnrichmentAnalysis.show_enrichment_categories()
-
- module:
Enrichment Analysis
Contains a set of functions based on the stringdb library for gene ontology analysis and enrichment analysis Look examples in Colab Notebook
ProteinNetworks.STRING_enrichment module
class ProteinNetworks.STRING_enrichment.EnrichmentAnalysis (data, enrichment=None, protein_id_type='UniProtID')
Bases: object
EnrichmentAnalysis class.
- Parameters:
- data: Dataframe containing the protein ID for analysis. It must contain either a “Gene” or “UniProtID” column’
- enrichment: Dataframe containing the results of previous enrichment analysis
- protein_id_type: type of protein ID. Valid Types
static create_subframe_by_names(df, column: str, names: [<class 'list'>, <class 'tuple'>, <class 'set'>], add: str = 'first')
function finds rows in original dataset and returns sub-dataframe including input names in selected column
- Parameters:
- df – target DataFrame
- column – the selected column in which names will be searched
- names – list of target names whose records need to be found in the table
- add – [‘first’, ‘last’, ‘all’] parameter of adding found rows. ‘first’ - add only the first entry ‘last’ - add only the last entry ‘all’ - add all entries
- Returns: sub-dataframe including input names in selected column
drop_duplicated_genes(silent=False)
function for droppig dublicated genes
- Parameters:
- subset: (list) Only consider certain columns for identifying duplicates, by default use all columns. return: df of dropped genes
get_category_terms(category: str, term_type: str = 'id')
function returns set of all terms in chosen category
- Parameters:
-
category: Name of category
-
term_type: ‘id’ or ‘description’.
id - returns terms IDs of category (for example, GO terms)
description - returns Description of IDs of category
-
- Returns: set of terms
get_enrichment()
function performs enrichment analysis. Results store in self.enrichment
- Returns: None
get_genes_by_localization(compartments: list, set_operation: str, save=False)
function for getting proteins localized in target compartments. You also can do common set operations under compartments genes
Example: get_genes_by_localization([Nucleus, Cytosol], ‘union’) - return proteins localized in Nucleus or Cytosol
- Parameters:
-
compartments: list of compartments. Will be attention:
-
Capitalization of letters matters. Get available compartment names by calling get_components_list().
-
Order of compartments matter if you want to get sets difference.
-
-
set_operation: operation between sets. This means that the operations will be applied sequentially to all sets from the compartments. [A, B, C], 'intersection' -> A and B and C
For example:
get_genes_by_localization([‘Nucleus’, ‘Cytosol’], ‘difference’) - return just nucleus proteins, get_genes_by_localization([‘Cytosol’, ‘Nucleus’], ‘union’) - return cytosol and nucleus proteins. get_genes_by_localization([‘all’, ‘Nucleus’], ‘difference’) - return all proteins except nucleus proteins.
-
get_genes_of_term(term: str)
function get genes from enrichment table by target term
- Parameters:
- term: target GO term from column ‘term’ in enrichment table
- Returns: list of genes associated with target term
get_mapped(species=9606)
function makes gene mapping, it finds STRINGids by protein ids. It`s important for future analysis
- Parameters:
- species: ID of organism. For example, Human species=9606
- Returns: None
prioretizingGO(terms: [<class 'list'>, <class 'set'>], organism='Human', domain='BP')
function for prioretizing GO-terms using R script with GOxploreR package (doi:10.1038/s41598-020-73326-3) See ‘RScript Prioretizing_GO.R’ work with R.4-3.x. Yoy need to add RScript in PATH
If you use this function in google-collab, you will have to install R-packages at the first launch. This may take a long time (up to 20 minutes)
- Parameters:
- terms – list of GO-terms
- organism – name of target organism
- domain – name of domain in GO-graph. Available inputs: ‘BP’ - Biological Process ‘CC’ - Cellular Component “MF” - Molecular Functions
- Returns: list of Prioretized GO terms
proteins_participation_in_the_category(df, category, term_type='id', term_sep='\n')
function check terms that proteins participated and make statistics table
- Parameters:
-
df: target DataFrame
-
category: Name of category
-
term_type: ‘id’ or ‘description’.
id - returns terms IDs of category (for example, GO terms)
description - returns Description of IDs of category
-
term_sep: terms connected with each protein will save in one cell. Choose separator beetwen terms
-
- Returns: None
static save_table(table, name, saveformat='xlsx', index: bool = True)
function for saving DataFrame tables
- Parameters:
- table: DataFrame
- name: name of file
- saveformat: format of saving file: ‘xlsx’ or ‘csv’
- index: show indexes in saved table?
- Returns: None
show_category_terms(category: str, show: [<class 'int'>, <class 'str'>] = 10, sort_by='genes', save: bool = False, savename='terms', saveformat='xlsx')
function displays all terms and number of associated genes in category
- Parameters:
- category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
- show: “all” or integer number. Number of strings to display
- sort_by: [“genes”, “term”] - sort by number of genes (by descending) or term names (by ascending)
- save: Need to save? Choose True. By default, save in .xlsx format
- savename: work with save=True, name of file
- saveformat: format of saving file: ‘xlsx’ or ‘csv’
- Returns: None
show_enrichest_terms_in_category(category: str, count: int = 10, sort_by='fdr', save: bool = False, savename='enrichment', saveformat='xlsx')
function shows top-%count of most enriched terms in %category
- Parameters:
- category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
- count: count of terms you need to show
- sort_by: you can sort target list by one of ‘fdr’, ‘p_value’, ‘number_of_genes’ parameters
- save: Need to save? Choose True. By default, save in .xlsx format
- savename: work with save=True, name of file
- saveformat: format of saving file: ‘xlsx’ or ‘csv’
- Returns: None
show_enrichment_categories()
function shown available enrichment categories for current dataset
- Returns: None
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file proteinnetworks-0.1.4.tar.gz
.
File metadata
- Download URL: proteinnetworks-0.1.4.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c71b172e12b6d66cbb33cd9a3bd32f6394edd79c5bf6f99499249b0b85851726 |
|
MD5 | a3ca2a00fa992a53e7754b08fc2e4ce0 |
|
BLAKE2b-256 | 29b7fa600e296171597abcf1743b63f3284460278c34091dae6056fe4d3c5919 |
File details
Details for the file ProteinNetworks-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: ProteinNetworks-0.1.4-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0e8991a771a75714bc1fe985a2f4a946952e612048a0e0267fcc2ad970008b7 |
|
MD5 | 96d0412d006c3667da67f2d362313742 |
|
BLAKE2b-256 | 63cefb9c05099631d16ea85c47274cf1867174f988313c7863266d742d9d17b1 |