Skip to main content

Module for working with protein networks (gene ontology, enrichment, protein-protein interactions, etc.)

Project description

ProteinNetworks

The library contains convenient tools for rapid analysis of gene ontology, enrichment and protein-protein interaction data. Based on the stringdb library. Some features require you to install R to work (see EnrichmentAnalysis.prioretizingGO())

The module will contain 4 sets of tools:

  • Enrichment Analysis
  • Protein networks Analysis
  • Group comparing tools
  • Visualization tools

Get Started

pip install -i https://test.pypi.org/simple/ ProteinNetworks==0.1.3

Contents:


Enrichment Analysis

Contains a set of functions based on the stringdb library for gene ontology analysis and enrichment analysis Look examples in Colab Notebook

ProteinNetworks.STRING_enrichment module

class ProteinNetworks.STRING_enrichment.EnrichmentAnalysis (data, enrichment=None, protein_id_type='UniProtID')

Bases: object

EnrichmentAnalysis class.

  • Parameters:
    • data: Dataframe containing the protein ID for analysis. It must contain either a “Gene” or “UniProtID” column’
    • enrichment: Dataframe containing the results of previous enrichment analysis
    • protein_id_type: type of protein ID. Valid Types

static create_subframe_by_names(df, column: str, names: [<class 'list'>, <class 'tuple'>, <class 'set'>], add: str = 'first')

function finds rows in original dataset and returns sub-dataframe including input names in selected column

  • Parameters:
    • df – target DataFrame
    • column – the selected column in which names will be searched
    • names – list of target names whose records need to be found in the table
    • add – [‘first’, ‘last’, ‘all’] parameter of adding found rows. ‘first’ - add only the first entry ‘last’ - add only the last entry ‘all’ - add all entries
  • Returns: sub-dataframe including input names in selected column

drop_duplicated_genes(silent=False)

function for droppig dublicated genes

  • Parameters:
    • subset: (list) Only consider certain columns for identifying duplicates, by default use all columns. return: df of dropped genes

get_category_terms(category: str, term_type: str = 'id')

function returns set of all terms in chosen category

  • Parameters:
    • category: Name of category

    • term_type: ‘id’ or ‘description’.

      id - returns terms IDs of category (for example, GO terms)

      description - returns Description of IDs of category

  • Returns: set of terms

get_enrichment()

function performs enrichment analysis. Results store in self.enrichment

  • Returns: None

get_genes_by_localization(compartments: list, set_operation: str, save=False)

function for getting proteins localized in target compartments. You also can do common set operations under compartments genes

Example: get_genes_by_localization([Nucleus, Cytosol], ‘union’) - return proteins localized in Nucleus or Cytosol

  • Parameters:
    • compartments: list of compartments. Will be attention:

      1. Capitalization of letters matters. Get available compartment names by calling get_components_list().

      2. Order of compartments matter if you want to get sets difference.

    • set_operation: operation between sets. This means that the operations will be applied sequentially to all sets from the compartments. [A, B, C], 'intersection' -> A and B and C

      For example:

      get_genes_by_localization([‘Nucleus’, ‘Cytosol’], ‘difference’) - return just nucleus proteins, get_genes_by_localization([‘Cytosol’, ‘Nucleus’], ‘union’) - return cytosol and nucleus proteins. get_genes_by_localization([‘all’, ‘Nucleus’], ‘difference’) - return all proteins except nucleus proteins.

get_genes_of_term(term: str)

function get genes from enrichment table by target term

  • Parameters:
    • term: target GO term from column ‘term’ in enrichment table
  • Returns: list of genes associated with target term

get_mapped(species=9606)

function makes gene mapping, it finds STRINGids by protein ids. It`s important for future analysis

  • Parameters:
    • species: ID of organism. For example, Human species=9606
  • Returns: None

prioretizingGO(terms: [<class 'list'>, <class 'set'>], organism='Human', domain='BP')

function for prioretizing GO-terms using R script with GOxploreR package (doi:10.1038/s41598-020-73326-3) See ‘RScript Prioretizing_GO.R’ work with R.4-3.x. Yoy need to add RScript in PATH

If you use this function in google-collab, you will have to install R-packages at the first launch. This may take a long time (up to 20 minutes)

  • Parameters:
    • terms – list of GO-terms
    • organism – name of target organism
    • domain – name of domain in GO-graph. Available inputs: ‘BP’ - Biological Process ‘CC’ - Cellular Component “MF” - Molecular Functions
  • Returns: list of Prioretized GO terms

proteins_participation_in_the_category(df, category, term_type='id', term_sep='\n')

function check terms that proteins participated and make statistics table

  • Parameters:
    • df: target DataFrame

    • category: Name of category

    • term_type: ‘id’ or ‘description’.

      id - returns terms IDs of category (for example, GO terms)

      description - returns Description of IDs of category

    • term_sep: terms connected with each protein will save in one cell. Choose separator beetwen terms

  • Returns: None

static save_table(table, name, saveformat='xlsx', index: bool = True)

function for saving DataFrame tables

  • Parameters:
    • table: DataFrame
    • name: name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
    • index: show indexes in saved table?
  • Returns: None

show_category_terms(category: str, show: [<class 'int'>, <class 'str'>] = 10, sort_by='genes', save: bool = False, savename='terms', saveformat='xlsx')

function displays all terms and number of associated genes in category

  • Parameters:
    • category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
    • show: “all” or integer number. Number of strings to display
    • sort_by: [“genes”, “term”] - sort by number of genes (by descending) or term names (by ascending)
    • save: Need to save? Choose True. By default, save in .xlsx format
    • savename: work with save=True, name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
  • Returns: None

show_enrichest_terms_in_category(category: str, count: int = 10, sort_by='fdr', save: bool = False, savename='enrichment', saveformat='xlsx')

function shows top-%count of most enriched terms in %category

  • Parameters:
    • category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
    • count: count of terms you need to show
    • sort_by: you can sort target list by one of ‘fdr’, ‘p_value’, ‘number_of_genes’ parameters
    • save: Need to save? Choose True. By default, save in .xlsx format
    • savename: work with save=True, name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
  • Returns: None

show_enrichment_categories()

function shown available enrichment categories for current dataset

  • Returns: None

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteinnetworks-0.1.4.tar.gz (28.6 kB view details)

Uploaded Source

Built Distribution

ProteinNetworks-0.1.4-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file proteinnetworks-0.1.4.tar.gz.

File metadata

  • Download URL: proteinnetworks-0.1.4.tar.gz
  • Upload date:
  • Size: 28.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for proteinnetworks-0.1.4.tar.gz
Algorithm Hash digest
SHA256 c71b172e12b6d66cbb33cd9a3bd32f6394edd79c5bf6f99499249b0b85851726
MD5 a3ca2a00fa992a53e7754b08fc2e4ce0
BLAKE2b-256 29b7fa600e296171597abcf1743b63f3284460278c34091dae6056fe4d3c5919

See more details on using hashes here.

File details

Details for the file ProteinNetworks-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for ProteinNetworks-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e0e8991a771a75714bc1fe985a2f4a946952e612048a0e0267fcc2ad970008b7
MD5 96d0412d006c3667da67f2d362313742
BLAKE2b-256 63cefb9c05099631d16ea85c47274cf1867174f988313c7863266d742d9d17b1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page