Skip to main content

Module for working with protein networks (gene ontology, enrichment, protein-protein interactions, etc.)

Project description

ProteinNetworks

The library contains convenient tools for rapid analysis of gene ontology, enrichment and protein-protein interaction data. Based on the stringdb library. Some features require you to install R to work (see EnrichmentAnalysis.prioretizingGO())

The module will contain 4 sets of tools:

  • Enrichment Analysis
  • Protein networks Analysis
  • Group comparing tools
  • Visualization tools

Get Started

pip install -i https://test.pypi.org/simple/ ProteinNetworks==0.1.3

Contents:


Enrichment Analysis

Contains a set of functions based on the stringdb library for gene ontology analysis and enrichment analysis Look examples in Colab Notebook

ProteinNetworks.STRING_enrichment module

class ProteinNetworks.STRING_enrichment.EnrichmentAnalysis (data, enrichment=None, protein_id_type='UniProtID')

Bases: object

EnrichmentAnalysis class.

  • Parameters:
    • data: Dataframe containing the protein ID for analysis. It must contain either a “Gene” or “UniProtID” column’
    • enrichment: Dataframe containing the results of previous enrichment analysis
    • protein_id_type: type of protein ID. Valid Types

static create_subframe_by_names(df, column: str, names: [<class 'list'>, <class 'tuple'>, <class 'set'>], add: str = 'first')

function finds rows in original dataset and returns sub-dataframe including input names in selected column

  • Parameters:
    • df – target DataFrame
    • column – the selected column in which names will be searched
    • names – list of target names whose records need to be found in the table
    • add – [‘first’, ‘last’, ‘all’] parameter of adding found rows. ‘first’ - add only the first entry ‘last’ - add only the last entry ‘all’ - add all entries
  • Returns: sub-dataframe including input names in selected column

drop_duplicated_genes(silent=False)

function for droppig dublicated genes

  • Parameters:
    • subset: (list) Only consider certain columns for identifying duplicates, by default use all columns. return: df of dropped genes

get_category_terms(category: str, term_type: str = 'id')

function returns set of all terms in chosen category

  • Parameters:
    • category: Name of category

    • term_type: ‘id’ or ‘description’.

      id - returns terms IDs of category (for example, GO terms)

      description - returns Description of IDs of category

  • Returns: set of terms

get_enrichment()

function performs enrichment analysis. Results store in self.enrichment

  • Returns: None

get_genes_by_localization(compartments: list, set_operation: str, save=False)

function for getting proteins localized in target compartments. You also can do common set operations under compartments genes

Example: get_genes_by_localization([Nucleus, Cytosol], ‘union’) - return proteins localized in Nucleus or Cytosol

  • Parameters:
    • compartments: list of compartments. Will be attention:

      1. Capitalization of letters matters. Get available compartment names by calling get_components_list().

      2. Order of compartments matter if you want to get sets difference.

    • set_operation: operation between sets. This means that the operations will be applied sequentially to all sets from the compartments. [A, B, C], 'intersection' -> A and B and C

      For example:

      get_genes_by_localization([‘Nucleus’, ‘Cytosol’], ‘difference’) - return just nucleus proteins, get_genes_by_localization([‘Cytosol’, ‘Nucleus’], ‘union’) - return cytosol and nucleus proteins. get_genes_by_localization([‘all’, ‘Nucleus’], ‘difference’) - return all proteins except nucleus proteins.

get_genes_of_term(term: str)

function get genes from enrichment table by target term

  • Parameters:
    • term: target GO term from column ‘term’ in enrichment table
  • Returns: list of genes associated with target term

get_mapped(species=9606)

function makes gene mapping, it finds STRINGids by protein ids. It`s important for future analysis

  • Parameters:
    • species: ID of organism. For example, Human species=9606
  • Returns: None

prioretizingGO(terms: [<class 'list'>, <class 'set'>], organism='Human', domain='BP')

function for prioretizing GO-terms using R script with GOxploreR package (doi:10.1038/s41598-020-73326-3) See ‘RScript Prioretizing_GO.R’ work with R.4-3.x. Yoy need to add RScript in PATH

If you use this function in google-collab, you will have to install R-packages at the first launch. This may take a long time (up to 20 minutes)

  • Parameters:
    • terms – list of GO-terms
    • organism – name of target organism
    • domain – name of domain in GO-graph. Available inputs: ‘BP’ - Biological Process ‘CC’ - Cellular Component “MF” - Molecular Functions
  • Returns: list of Prioretized GO terms

proteins_participation_in_the_category(df, category, term_type='id', term_sep='\n')

function check terms that proteins participated and make statistics table

  • Parameters:
    • df: target DataFrame

    • category: Name of category

    • term_type: ‘id’ or ‘description’.

      id - returns terms IDs of category (for example, GO terms)

      description - returns Description of IDs of category

    • term_sep: terms connected with each protein will save in one cell. Choose separator beetwen terms

  • Returns: None

static save_table(table, name, saveformat='xlsx', index: bool = True)

function for saving DataFrame tables

  • Parameters:
    • table: DataFrame
    • name: name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
    • index: show indexes in saved table?
  • Returns: None

show_category_terms(category: str, show: [<class 'int'>, <class 'str'>] = 10, sort_by='genes', save: bool = False, savename='terms', saveformat='xlsx')

function displays all terms and number of associated genes in category

  • Parameters:
    • category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
    • show: “all” or integer number. Number of strings to display
    • sort_by: [“genes”, “term”] - sort by number of genes (by descending) or term names (by ascending)
    • save: Need to save? Choose True. By default, save in .xlsx format
    • savename: work with save=True, name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
  • Returns: None

show_enrichest_terms_in_category(category: str, count: int = 10, sort_by='fdr', save: bool = False, savename='enrichment', saveformat='xlsx')

function shows top-%count of most enriched terms in %category

  • Parameters:
    • category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
    • count: count of terms you need to show
    • sort_by: you can sort target list by one of ‘fdr’, ‘p_value’, ‘number_of_genes’ parameters
    • save: Need to save? Choose True. By default, save in .xlsx format
    • savename: work with save=True, name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
  • Returns: None

show_enrichment_categories()

function shown available enrichment categories for current dataset

  • Returns: None

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteinnetworks-0.1.5.tar.gz (28.7 kB view details)

Uploaded Source

Built Distribution

ProteinNetworks-0.1.5-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file proteinnetworks-0.1.5.tar.gz.

File metadata

  • Download URL: proteinnetworks-0.1.5.tar.gz
  • Upload date:
  • Size: 28.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for proteinnetworks-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7dd7d6ba275ba83c02572fb5cdcfa70a3d0ed7d1d663c31bdd37fa8056b0def7
MD5 be43ed4841f1960c7d385d5662fcb845
BLAKE2b-256 1ed9b1c21c0117ddd59b16e4f3a4b2d940d1cdb24a003fa65bce2bae46ce3065

See more details on using hashes here.

File details

Details for the file ProteinNetworks-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for ProteinNetworks-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b6763a3a0cc0115b5649318e6ba4c733d0e985e0b9ca90ca93ae6aedde1faac8
MD5 35e114355232f5f92dc93c5242cf205d
BLAKE2b-256 46e4e269cbeee2452d6d7a04bddada7b15929ed9de4f871361dac3769a0a1960

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page