Weak Heuristic Inference for Supervisory Protein intERaction mapping for PDB and AP-MS datasets
Project description
whisper
whisper is a Python package for scoring protein–protein interactions from proximity labeling and affinity purification mass spectrometry datasets. It uses interpretable features, programmatic weak supervision, and decoy-based false discovery rate (FDR) estimation to identify high-confidence interactors.
Installation
git clone https://github.com/camlab-bioml/whisper
cd whisper
pip install .
Input Format
- A CSV file with:
- One column named
Protein - Other columns representing bait replicate intensities, named as
BAIT_1,BAIT_2, etc.
- One column named
- Control samples must be identifiable via substrings in their column names (e.g.,
"EGFP"or"Empty").
Usage
#protein-level
from whisper.protein_features import feature_engineering_protein
from whisper.protein_train import train_and_score_protein
import pandas as pd
# Load intensity table
intensity_df = pd.read_csv("input_intensity_dataset.tsv", sep="\t")
controls = ['EGFP', 'Empty', 'NminiTurbo']
# Run feature engineering
features_df = feature_engineering_protein(intensity_df, controls)
# You can save the features to use in the next step with different settings without generating them again.
features_df = pd.read_csv("features.csv")
# Run scoring and FDR estimation
scored_df = train_and_score_protein(features_df, initial_positives=15, initial_negatives=200)
#peptide-level
from whisper.peptide_features import feature_engineering_peptide
from whisper.peptide_train import train_and_score_peptide
import pandas as pd
# Load intensity table
intensity_df = pd.read_csv("input_intensity_dataset.tsv", sep="\t")
controls = ['EGFP', 'Empty', 'NminiTurbo']
# Run feature engineering
features_df = feature_engineering_peptide(intensity_df, controls)
# features_df = pd.read_csv("features.csv")
# Run scoring and FDR estimation
scored_df = train_and_score_peptide(features_df, initial_positives=15, initial_negatives=200)
#fragment-level
from whisper.fragment_features import feature_engineering_fragment
from whisper.fragment_train import train_and_score_fragment
import pandas as pd
# Load intensity table
intensity_df = pd.read_csv("input_intensity_dataset.tsv", sep="\t")
controls = ['EGFP', 'Empty', 'NminiTurbo']
# Run feature engineering
features_df = feature_engineering_fragment(intensity_df, controls)
# features_df = pd.read_csv("features.csv")
# Run scoring and FDR estimation
scored_df = train_and_score_fragment(features_df, initial_positives=15, initial_negatives=200)
Output
The final output includes:
predicted_probability: Probability of each bait–prey interaction being realFDR: Estimated false discovery rateglobal_cv_flag: Flag for likely background preys based on variability across all samples
Tutorial
Citation
This software is authored by: Vesal Kasmaeifar, Kieran R Campbell
Lunenfeld-Tanenbaum Research Institute & University of Toronto
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whisper_ppi-0.1.0.tar.gz.
File metadata
- Download URL: whisper_ppi-0.1.0.tar.gz
- Upload date:
- Size: 11.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62e41c1364a7fb209a268aadc4ff94dbe2fae2713552da419d0ea29964b22db2
|
|
| MD5 |
3e0f4f2a99806a447489a04c16c91881
|
|
| BLAKE2b-256 |
19b8cafd16ba4b23969e40a292e46aad017e55d61eb54ae6962175c652ca0ec3
|
File details
Details for the file whisper_ppi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: whisper_ppi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bd189adef84651c2afe4b6901957a4bb0cb0a66b02c274476750be491e20084
|
|
| MD5 |
c22d308af976f812f28543d7fb2e656c
|
|
| BLAKE2b-256 |
ea3dfa210d46b5d7238b2e04d8937302c3bca573901146f7c0a43bbf0c4ec599
|