Skip to main content

GIP -- Gaussian Interaction Profiler

Project description

GIP

Gaussian Interaction Profiler

Contents

About

We introduce the Gaussian Interaction Profiler (GIP), a Gaussian mixture modeling-based clustering workflow for complexome profiling data. GIP assigns proteins to a set number of clusters by modeling the migration profile of each cluster. Using bootstrapping, GIP offers a way to prioritize actual interactors over spuriously comigrating proteins.

For a more complete description of the software and its applications, please refer to the manuscript (see here).

Installation

GIP is implemented as a flexible python package, which requires installing the package and its dependencies from the python package index (pip).

dependencies

pip

installation with pip ensures the dependencies are automatically installed alongside GIP.

pip install gip-bio

Installation from repository

git clone git@github.com:joerivstrien/gip-bio.git
cd gip-bio
pip install .

Usage

Input Data

GIP takes as input a single complexome profiling dataset, consisting of a series of abundance values that represent the fraction of the migration pattern for each detected protein.

  • complexome profile
  • protein annotation file

complexome profiling data

A single complexome profiling dataset, consisting of a matrix of expression/abundance data. These data should be provided as pandas DataFrame, with the index row containing protein identifiers. The GIP package contains a function (process_normalise.parse_profile) to load a complexome profile from a tab-separated text (tsv) file. An example of a file containing a complexome profile is available here

protein annotation file

To provide additional protein annotations aside from their identifiers to the output tables containing the GIP analysis results, a table can be provided containing these annotations. This table should be provided as a pandas.DataFrame, where the index contains protein identifiers that match those in the provided complexome profile. An example annotation file, in tab-separated (tsv) format is available here

Running a complete GIP analysis

from gip.main import main
import gip.process_normalise as prn
import pandas as pd

# parse complexome profile and protein annotation file
prof = prn.parse_profile('path/to/profile.tsv')
annot = pd.read_csv('path/to/annot_fn.tsv',sep='\t',index_col=0)

# set ratio of clusters relative to number of detected proteins
clust_ratio = 0.5

# to run a standard run, using 4 threads for the bootstrapping
gip_results = main(prof, clust_ratio, annot_df=annot, bs_processes=4)

Output

The main result of a GIP analysis is a set of clusters. The clusters are annotated with a variety of metrics that facilitate easy interpretation and prioritization of clusters likely corresponding to actual protein complexes. An overview of all resulting clusters with these features is provided in the output as a table ('clusttable'), which can optionally be saved to a file, using the "clusttable_fn" parameter.

Similarly, all clustered proteins are also annotated with a number of metrics reflecting their assigned cluster, abundance and the consistency with which they are part of their cluster. All protein members are provided as a separate table ('membertable), which can also optionally be saved to a file using the "membertable_fn" parameter.

For a complete description of the output from a GIP analysis please refer to the documentation of the main function here

Licence

GIP -- Gaussian Interaction Profiler
Copyright (C) 2023 Radboud University Medical Center

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
published by the Free Software Foundation, either version 3 of the
License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.

Issues

If you have questions or encounter any problems or bugs, please report them in the issue channel.

Citing GIP

Publication Pending

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gip-bio-0.1.0.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

gip_bio-0.1.0-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file gip-bio-0.1.0.tar.gz.

File metadata

  • Download URL: gip-bio-0.1.0.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.4

File hashes

Hashes for gip-bio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 21766dd43de5b1764a2e29d3440cafde6966af71a7febd51584001495d527b40
MD5 9f8cc0de17daaea58cee2f146239a1cf
BLAKE2b-256 0c3b1b297700b3b33b5dabf350e2269a31cd56e290e1e5fb41f60e96d96999cd

See more details on using hashes here.

File details

Details for the file gip_bio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gip_bio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.4

File hashes

Hashes for gip_bio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7af5a1eb1f8566b4ffa82c4411737fdbc8755ff28c0b47d75c8b1b565f03b830
MD5 3293026c691559b21d39139b6dcbe6ef
BLAKE2b-256 9ada781b0a7bfa4289b282f75a2e96a17718e8dbcc44b3a05c71fe0b0eb67ec9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page