Skip to main content

Analysis of bottom-up mass-spectrometry proteomics data.

Project description

ProteoPy

License Python 3 PyPI version Downloads Total downloads Tests Documentation codecov

ProteoPy logo

ProteoPy is a Python library that brings quantitative proteomics into the AnnData ecosystem. It provides a unified and extensible framework for protein- and peptide-level analysis — from data import through quality control, preprocessing, and differential abundance testing — while storing all data and metadata in a single portable object.

Official documentation: proteopy.readthedocs.io

Why ProteoPy?

Mass spectrometry-based proteomics lacks a standardized data structure in python comparable to what AnnData provides for single-cell transcriptomics. Existing tools rely on distinct formats and scripting environments, forcing researchers to learn multiple ecosystems and making multi-omics integration cumbersome. ProteoPy bridges this gap by adopting the proven AnnData framework, enabling:

  • Familiar workflows for users of scanpy, squidpy, and the broader single-cell python ecosystem
  • Reproducible analyses with all processing steps tracked in a single object
  • Seamless multi-omics integration via direct compatibility with MuData and MUON
  • Direct scanpy integration for dimensionality reduction, clustering and visualization as well as single-cell analysis compatibility.

Key Features

  • Flexible data import from DIA-NN, MaxQuant, and generic tabular formats
  • Quality control & filtering with completeness metrics, CV analysis, and contaminant removal
  • Preprocessing including normalization, batch correction (via scanpy), and missing-value imputation
  • Peptide-level analysis with overlapping peptide grouping, peptide-to- protein quantification, and per-protein peptide intensity visualization
  • Differential abundance analysis with t-test, Welch's test and multiple testing correction
  • Proteoform inference via a reimplementation of the COPF algorithm for detecting functional proteoform groups from peptide-level data
  • Publication-ready visualizations for QC, exploratory analysis, and statistical results

Installation

ProteoPy requires Python 3.10 or later. We recommend installing ProteoPy in a dedicated virtual environment:

# Using venv
python -m venv proteopy-env
source proteopy-env/bin/activate  # Linux/macOS
# proteopy-env\Scripts\activate   # Windows
pip install ipykernel
python -m ipykernel install --user --name=proteopy-env

# Using conda
conda create -n proteopy-env "python>=3.10"
conda activate proteopy-env
pip install ipykernel
python -m ipykernel install --user --name=proteopy-env

# Using uv
uv venv proteopy-env
source proteopy-env/bin/activate
uv pip install ipykernel
python -m ipykernel install --user --name=proteopy-env

Then install ProteoPy:

pip install proteopy

For notebook-centric workflows, the [usage] extra installs ipykernel, jupyterlab, and scanpy (for extended analysis functionality such as batch control, PCA, UMAP and more):

pip install proteopy[usage]

To install the development version from GitHub:

pip install git+https://github.com/UKHD-NP/proteopy.git

Documentation

Full documentation, including API reference and tutorials, is available at proteopy.readthedocs.io.

Tutorials

  • Protein-level analysis — Complete workflow from data import to differential abundance analysis (notebook)
  • Proteoform inference — Detecting functional proteoform groups with COPF (notebook)

Quick Start

import proteopy as pr
import scanpy as sc

# Load example dataset
adata = pr.datasets.karayel_2020()

# Quality control: filter by completeness
pr.pp.filter_var_completeness(adata, min_fraction=0.8, zero_to_na=True)

# Preprocessing
pr.pp.normalize_median(adata, log_space=True)
pr.pp.impute_downshift(adata, downshift=1.8, width=0.3)

# Differential abundance analysis
pr.tl.differential_abundance(adata, method="ttest_two_sample", group_by="cell_type")

# Visualize results
pr.pl.volcano_plot(adata, varm_slot="ttest_two_sample;cell_type;Ortho_vs_rest")

# Seamless scanpy integration for dimensionality reduction
sc.tl.pca(adata)
sc.pl.pca(adata, color="cell_type")

Support

Citing ProteoPy

If you use ProteoPy in your research, please cite:

Fichtner ID, Sahm F, Gerstung M, Bludau I. ProteoPy: an AnnData-based framework for integrated proteomics analysis. UNPUBLISHED (2025).

@article{fichtner2025proteopy,
  title={ProteoPy: an AnnData-based framework for integrated proteomics analysis},
  author={Fichtner, Ian Dirk and Sahm, Felix and Gerstung, Moritz and Bludau, Isabell},
  journal={UNPUBLISHED},
  year={2025}
}

If you use the COPF proteoform inference functionality, please also cite:

Bludau I, et al. Systematic detection of functional proteoform groups from bottom-up proteomic datasets. Nat. Commun. 12, 3810 (2021). doi:10.1038/s41467-021-24030-x

License

ProteoPy was developed by the Bludau Lab at the Neuropathology Department Heidelberg and is freely available under the Apache 2.0 license. External Python dependencies (see pyproject.toml file) have their own licenses, which can be consulted on their respective websites.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proteopy-0.1.1.tar.gz (15.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

proteopy-0.1.1-py3-none-any.whl (175.7 kB view details)

Uploaded Python 3

File details

Details for the file proteopy-0.1.1.tar.gz.

File metadata

  • Download URL: proteopy-0.1.1.tar.gz
  • Upload date:
  • Size: 15.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for proteopy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6bf7b0a9904c520d9520d4c74b3a626a9afb2fd9996e6945dcfa680446394ac0
MD5 011270a74b4239591e58b9dd98158f50
BLAKE2b-256 226a8c4f1ff76cbff28b596287ce8e64d9f47d489638486ac8574249c59c706c

See more details on using hashes here.

File details

Details for the file proteopy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: proteopy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 175.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for proteopy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2e9c2c2736857006093212385e95ef910284bc76ab09313b1a7bdd6bfc4b72c3
MD5 43ae5d414e35b85bd9cd035efe2a3cbe
BLAKE2b-256 7edbb297c7ac29677df2fef3b840fdd3c6c5873a612f357c7e19daf07592eb01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page