Utility functions for proteomics data analysis
Reason this release was yanked:
Broken annotation downloads
Project description
Dousatsu
Dousatsu is a Python library for the analysis of quantitative mass spectrometry-based proteomics data. It provides a set of tools for feature preprocessing, analysis, selection, and visualization, enabling a comprehensive workflow from raw data to biological insights.
The library is designed to be modular and easy to use, with a focus on integrating with the scientific Python ecosystem, including pandas, numpy, scikit-learn, and statsmodels.
Core Modules
Dousatsu is organized into four main modules, each addressing a specific step in the proteomics data analysis pipeline:
feature_preprocessing
This module provides a suite of tools for cleaning, normalizing, and transforming raw proteomics data into an analysis-ready format. Key functionalities include:
- Data Loading: Functions to load data from common proteomics software outputs like TRIC, Diann, and Spectronaut.
- Data Cleaning: Transformers to remove contaminants, non-proteotypic peptides, and low-quality data based on intensity and q-value cutoffs.
- Data Formatting: Tools to reshape data from wide to long format and to standardize column names.
- Normalization: Methods for median and quantile normalization to correct for systematic variations between samples.
- Missing Value Imputation: Strategies to handle missing values, a common issue in proteomics data.
The preprocessing steps are implemented as scikit-learn compatible transformers, allowing them to be chained together in a Pipeline.
feature_analysis
Once the data is preprocessed, this module offers functions for statistical analysis to identify differentially abundant proteins or peptides. Features include:
- Statistical Tests: Implementation of two-sample t-tests with corrections for multiple testing (e.g., Benjamini-Hochberg).
- Fold Change Calculation: Functions to calculate log2 fold changes between different conditions.
- Correlation Analysis: Tools to assess the correlation between technical replicates.
feature_selection
This module helps in identifying the most informative features (peptides or proteins) for building predictive models or for biomarker discovery. It includes:
- Recursive Feature Elimination (RFE): A cross-validated RFE implementation to select the most stable and predictive features.
- Visualization: Functions to visualize the results of the feature selection process.
feature_visualization
A picture is worth a thousand words. This module provides a wide range of visualization functions to explore the data and present the results of the analysis:
- Dimensionality Reduction: PCA plots to visualize sample clustering and identify batch effects.
- Differential Abundance: Volcano plots to visualize the results of statistical tests.
- Heatmaps: Clustermaps to visualize the expression patterns of proteins or peptides across samples.
- Data Quality: Plots for visualizing intensity distributions and missingness.
Development Environment
This repository is set up for development inside a Docker container to ensure a consistent and reproducible environment.
Requirements
- Docker
How to use
Initial setup
- Clone the repository.
- Build and start the development container:
./start_dev.sh
- The first time you start the container, install the pre-commit hooks:
pre-commit install
Developing
- The project directory is mounted inside the container at
/App, so you can edit the files on your host machine with your favorite editor. - Run all git commands from within the container.
- Install the package in editable mode to test your changes:
pip install -e .
- To stop the container, run:
./stop_dev.sh
This will also remove the container, so you can start fresh the next time.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dousatsu-0.2.0.tar.gz.
File metadata
- Download URL: dousatsu-0.2.0.tar.gz
- Upload date:
- Size: 69.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c389b035f28b7894317169ee372533bbc785a80bd869b8c7447278b3ec2ba46d
|
|
| MD5 |
d7df3a8626a12f88698ba08923c12f76
|
|
| BLAKE2b-256 |
6577201e25374a6694576ba74816cac907d4e0c767159184bbabd9295cb95e21
|
File details
Details for the file dousatsu-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dousatsu-0.2.0-py3-none-any.whl
- Upload date:
- Size: 81.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86d0f8c82009d7d704093a2ac47c3cae5b44fc199acb3e0a4512d2e12046dbc3
|
|
| MD5 |
e4f4cedb20935a9e8140e8f0f3dcaf5d
|
|
| BLAKE2b-256 |
85882bbb9935a0850f9a9a69692abb1040e943fb654f983f361be6bd5fd7cf0f
|