Skip to main content

Module for working with protein networks (gene ontology, enrichment, protein-protein interactions, etc.)

Project description

ProteinNetworks

The library contains convenient tools for rapid analysis of gene ontology, enrichment and protein-protein interaction data. Based on the stringdb library. Some features require you to install R to work (see EnrichmentAnalysis.prioretizingGO())

The module will contain 4 sets of tools:

  • Enrichment Analysis
  • Protein networks Analysis
  • Group comparing tools
  • Visualization tools

Get Started

pip install -i https://test.pypi.org/simple/ ProteinNetworks==0.1.3

Contents:


Enrichment Analysis

Contains a set of functions based on the stringdb library for gene ontology analysis and enrichment analysis Look examples in Colab Notebook

ProteinNetworks.STRING_enrichment module

class ProteinNetworks.STRING_enrichment.EnrichmentAnalysis (data, enrichment=None, protein_id_type='UniProtID')

Bases: object

EnrichmentAnalysis class.

  • Parameters:
    • data: Dataframe containing the protein ID for analysis. It must contain either a “Gene” or “UniProtID” column’
    • enrichment: Dataframe containing the results of previous enrichment analysis
    • protein_id_type: type of protein ID. Valid Types

static create_subframe_by_names(df, column: str, names: [<class 'list'>, <class 'tuple'>, <class 'set'>], add: str = 'first')

function finds rows in original dataset and returns sub-dataframe including input names in selected column

  • Parameters:
    • df – target DataFrame
    • column – the selected column in which names will be searched
    • names – list of target names whose records need to be found in the table
    • add – [‘first’, ‘last’, ‘all’] parameter of adding found rows. ‘first’ - add only the first entry ‘last’ - add only the last entry ‘all’ - add all entries
  • Returns: sub-dataframe including input names in selected column

drop_duplicated_genes(silent=False)

function for droppig dublicated genes

  • Parameters:
    • subset: (list) Only consider certain columns for identifying duplicates, by default use all columns. return: df of dropped genes

get_category_terms(category: str, term_type: str = 'id')

function returns set of all terms in chosen category

  • Parameters:
    • category: Name of category

    • term_type: ‘id’ or ‘description’.

      id - returns terms IDs of category (for example, GO terms)

      description - returns Description of IDs of category

  • Returns: set of terms

get_enrichment()

function performs enrichment analysis. Results store in self.enrichment

  • Returns: None

get_genes_by_localization(compartments: list, set_operation: str, save=False)

function for getting proteins localized in target compartments. You also can do common set operations under compartments genes

Example: get_genes_by_localization([Nucleus, Cytosol], ‘union’) - return proteins localized in Nucleus or Cytosol

  • Parameters:
    • compartments: list of compartments. Will be attention:

      1. Capitalization of letters matters. Get available compartment names by calling get_components_list().

      2. Order of compartments matter if you want to get sets difference.

    • set_operation: operation between sets. This means that the operations will be applied sequentially to all sets from the compartments. [A, B, C], 'intersection' -> A and B and C

      For example:

      get_genes_by_localization([‘Nucleus’, ‘Cytosol’], ‘difference’) - return just nucleus proteins, get_genes_by_localization([‘Cytosol’, ‘Nucleus’], ‘union’) - return cytosol and nucleus proteins. get_genes_by_localization([‘all’, ‘Nucleus’], ‘difference’) - return all proteins except nucleus proteins.

get_genes_of_term(term: str)

function get genes from enrichment table by target term

  • Parameters:
    • term: target GO term from column ‘term’ in enrichment table
  • Returns: list of genes associated with target term

get_mapped(species=9606)

function makes gene mapping, it finds STRINGids by protein ids. It`s important for future analysis

  • Parameters:
    • species: ID of organism. For example, Human species=9606
  • Returns: None

prioretizingGO(terms: [<class 'list'>, <class 'set'>], organism='Human', domain='BP')

function for prioretizing GO-terms using R script with GOxploreR package (doi:10.1038/s41598-020-73326-3) See ‘RScript Prioretizing_GO.R’ work with R.4-3.x. Yoy need to add RScript in PATH

If you use this function in google-collab, you will have to install R-packages at the first launch. This may take a long time (up to 20 minutes)

  • Parameters:
    • terms – list of GO-terms
    • organism – name of target organism
    • domain – name of domain in GO-graph. Available inputs: ‘BP’ - Biological Process ‘CC’ - Cellular Component “MF” - Molecular Functions
  • Returns: list of Prioretized GO terms

proteins_participation_in_the_category(df, category, term_type='id', term_sep='\n')

function check terms that proteins participated and make statistics table

  • Parameters:
    • df: target DataFrame

    • category: Name of category

    • term_type: ‘id’ or ‘description’.

      id - returns terms IDs of category (for example, GO terms)

      description - returns Description of IDs of category

    • term_sep: terms connected with each protein will save in one cell. Choose separator beetwen terms

  • Returns: None

static save_table(table, name, saveformat='xlsx', index: bool = True)

function for saving DataFrame tables

  • Parameters:
    • table: DataFrame
    • name: name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
    • index: show indexes in saved table?
  • Returns: None

show_category_terms(category: str, show: [<class 'int'>, <class 'str'>] = 10, sort_by='genes', save: bool = False, savename='terms', saveformat='xlsx')

function displays all terms and number of associated genes in category

  • Parameters:
    • category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
    • show: “all” or integer number. Number of strings to display
    • sort_by: [“genes”, “term”] - sort by number of genes (by descending) or term names (by ascending)
    • save: Need to save? Choose True. By default, save in .xlsx format
    • savename: work with save=True, name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
  • Returns: None

show_enrichest_terms_in_category(category: str, count: int = 10, sort_by='fdr', save: bool = False, savename='enrichment', saveformat='xlsx')

function shows top-%count of most enriched terms in %category

  • Parameters:
    • category: Name of category. You can check available category by calling ‘show_enrichment_categories’ method
    • count: count of terms you need to show
    • sort_by: you can sort target list by one of ‘fdr’, ‘p_value’, ‘number_of_genes’ parameters
    • save: Need to save? Choose True. By default, save in .xlsx format
    • savename: work with save=True, name of file
    • saveformat: format of saving file: ‘xlsx’ or ‘csv’
  • Returns: None

show_enrichment_categories()

function shown available enrichment categories for current dataset

  • Returns: None

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteinnetworks-0.1.3.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

ProteinNetworks-0.1.3-py3-none-any.whl (37.4 kB view details)

Uploaded Python 3

File details

Details for the file proteinnetworks-0.1.3.tar.gz.

File metadata

  • Download URL: proteinnetworks-0.1.3.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for proteinnetworks-0.1.3.tar.gz
Algorithm Hash digest
SHA256 29f646192acebbc10ad88b1b86e7bc5afdafa93cdcbe5e381ea7740ece594de2
MD5 f60a2d53cef3ee2e54828fe8d6537d5b
BLAKE2b-256 b8d6812e9fefe124e20ff1b13d8518edcbbbe4ae54b0d7c1c11440dd5ab603c5

See more details on using hashes here.

File details

Details for the file ProteinNetworks-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for ProteinNetworks-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4f7b971fc95ae21db47797ff932588395957490e49c189c8e1c8126a1607ca51
MD5 537870a72a52a5acc7853850e34c010f
BLAKE2b-256 a2eaa1667af762ce9e3b12a1fc4f11358d4219856937d9d5696137c2d44a4579

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page