Pro-ker Proteomics Analysis — an interactive browser-based proteomics data visualization tool
Project description
Pro-ker Proteomics Analysis
An interactive browser-based tool for proteomics data visualization and statistical analysis. Upload MaxQuant proteinGroups.txt files and explore your data through configurable plots, group comparisons, and export-ready figures.
Installation
pip install proker
Quick Start
proker # launch the viewer on default port 8050
proker data.txt # launch and auto-load a file
proker --port 9000 # use a custom port
proker --install # create a desktop shortcut
proker --update # check for updates
One-click installers are also provided for Windows (Install_Windows.bat) and macOS (Install_macOS.command).
Features
- Upload & Parse — MaxQuant proteinGroups.txt (tab-separated)
- Sample Grouping — Assign samples to groups by shared prefix or manually
- Processing — Intensity/LFQ selection, normalization, pruning, missing-value filtering
- Derived Groups — Create ratio, difference, sum, or product groups from existing groups
- Visualizations — Volcano plots, dot plots, enrichment plots, unique protein plots, PCA
- Graph Settings — Per-plot marker styling, grid, background, font, threshold line toggles
- Canvas — Drag, resize, freeze, annotate, and right-click label multiple plots
- Themes — Built-in dark/light presets and custom color themes
- Export — SVG (vector), PNG (high-resolution raster), and CSV (full analysis data)
- Sessions — Save/load full analysis state as JSON; auto-save on every change
Statistical Methods
Volcano Plot — Differential Expression Analysis
Volcano plots visualize fold change vs. statistical significance for pairwise group comparisons. Each point represents a protein; its x-position is the log2 fold change and y-position is the -log10 FDR-adjusted p-value.
Welch's t-test
Group means are compared using Welch's t-test (two-sample, unequal variance). The test statistic is:
t = |mean_A - mean_B| / (SE + S0)
where SE = sqrt(var_A/n_A + var_B/n_B) is the pooled standard error and S0 is an optional fudge factor (see below). Degrees of freedom are estimated via the Welch-Satterthwaite approximation:
df = (var_A/n_A + var_B/n_B)^2 / ((var_A/n_A)^2/(n_A-1) + (var_B/n_B)^2/(n_B-1))
P-values are computed from the regularized incomplete beta function.
Benjamini-Hochberg FDR Correction
Raw p-values are adjusted for multiple testing using the Benjamini-Hochberg procedure to control the false discovery rate:
adjusted_p[i] = min(adjusted_p[i+1], raw_p[i] * n / rank[i])
where p-values are sorted in ascending order and adjusted from the largest rank downward. This controls the expected proportion of false positives among rejected hypotheses.
S0 Low-Abundance Correction
The S0 parameter (Tusher et al., 2001; used by Perseus/SAM) is a fudge factor added to the t-test denominator to penalize fold changes driven by low-abundance noise:
t_s0 = |mean_A - mean_B| / (SE + S0)
- At S0 = 0 (default), this is a standard Welch's t-test
- At S0 > 0, proteins with small standard error (typically low-abundance) require larger absolute differences to reach significance
- Typical values: 0.1 to 2.0 (Perseus default: 0.1)
This prevents the common problem where low-abundance proteins show extreme fold changes purely due to measurement noise, flooding volcano plot extremes with unreliable hits.
Log2 Fold Change
Fold change is calculated as:
log2FC = log2(mean_GroupX / mean_GroupY)
- Positive values (right side of the plot) = higher abundance in Group X
- Negative values (left side of the plot) = higher abundance in Group Y
Missing Value Imputation
When enabled, zero/missing intensity values are replaced with half the minimum detected non-zero value for that protein across the relevant samples. This is a simple down-shift imputation approach that assumes missing values arise from proteins below the detection limit.
Significance Thresholds
Points are classified as significant when both conditions are met:
- FDR-adjusted p-value < FDR threshold (default: 0.05)
- |log2 FC| >= fold change threshold (default: 1.0)
Dotted reference lines on the plot mark these thresholds and can be toggled off via Graph Settings.
PCA — Principal Component Analysis
PCA reduces high-dimensional proteomics data to two principal components for sample-level visualization, revealing batch effects, outliers, and group separation.
Algorithm
Pro-ker uses dual-space PCA (kernel method), optimized for proteomics datasets where the number of samples (n) is much smaller than the number of features/proteins (m):
- Mean centering — Each protein's values are centered by subtracting the column mean across all samples
- Kernel matrix — The n x n kernel matrix K = X * X^T is computed (instead of the m x m covariance matrix)
- Eigendecomposition — Jacobi eigendecomposition extracts eigenvalues and eigenvectors from K
- Projection — PC scores are computed as eigenvectors scaled by the square root of their eigenvalues
Output
- PC1, PC2 — The first two principal components (axes of greatest variance)
- Variance explained — Percentage of total variance captured by each component, shown in axis labels
- Requires at least 3 samples
Configuration Options
| Parameter | Plot | Default | Description |
|---|---|---|---|
| FDR threshold | Volcano | 0.05 | Significance cutoff for adjusted p-values (0.05, 0.01, 0.001) |
| |Log2 FC| threshold | Volcano | 1.0 | Minimum absolute fold change for significance |
| S0 | Volcano | 0 | Low-abundance correction fudge factor (0 = off) |
| Impute missing | Volcano | On | Replace zeros with min(non-zero)/2 |
| Threshold lines | Volcano | On | Show/hide dotted threshold reference lines (Graph Settings) |
Export
CSV Export
The Export Analysis as CSV button (Analysis tab) produces a comprehensive file containing:
- Metadata — Software version, source file, protein counts
- Processing settings — Quantification type, normalization, pruning parameters
- Sample groups — Group assignments and derived groups
- Raw data — Pre-processing quantification values per sample
- Processed data — Post-filtering/normalization values with group means
- Volcano plot statistics — Per-protein fold change, p-value, FDR-adjusted p-value, and significance classification for each volcano plot on the canvas
- PCA scores — PC1/PC2 coordinates per sample with variance explained for each PCA plot on the canvas
Canvas Export
Plots on the canvas can be exported as:
- SVG — Fully vectorized, editable in Illustrator/Inkscape/Figma
- PNG — High-resolution raster at 2x scale
References
- Benjamini, Y. & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B, 57(1), 289-300.
- Tusher, V.G., Tibshirani, R. & Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98(9), 5116-5121.
- Tyanova, S. et al. (2016). The Perseus computational platform for comprehensive analysis of (prote)omics data. Nature Methods, 13(9), 731-740.
- Cox, J. & Mann, M. (2008). MaxQuant enables high peptide identification rates. Nature Biotechnology, 26(12), 1367-1372.
Version History
See CHANGELOG.md for the full version history.
Current version: 4.2.2 (April 2026)
Citation
Ngo, B.M. (2026). Pro-ker Proteomics Analysis [Software].
License
Proprietary. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file proker-4.3.1.tar.gz.
File metadata
- Download URL: proker-4.3.1.tar.gz
- Upload date:
- Size: 101.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a43e43478b0ee71e1445c32fc7aaee28cfa4ba1ad7a99f596131411e59a851fb
|
|
| MD5 |
38c6989ac48c194acd7d45537edeed0f
|
|
| BLAKE2b-256 |
2183e8321b53caac0213a92b398555424392b8f494b58e6ce6debecc4c450fe8
|
Provenance
The following attestation bundles were made for proker-4.3.1.tar.gz:
Publisher:
publish.yml on billy-ngo/proteomics-viewer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
proker-4.3.1.tar.gz -
Subject digest:
a43e43478b0ee71e1445c32fc7aaee28cfa4ba1ad7a99f596131411e59a851fb - Sigstore transparency entry: 1399195310
- Sigstore integration time:
-
Permalink:
billy-ngo/proteomics-viewer@26e7f0650eb7ecccb5b112fcb3395c6acf2fdb9f -
Branch / Tag:
refs/tags/v4.3.1 - Owner: https://github.com/billy-ngo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@26e7f0650eb7ecccb5b112fcb3395c6acf2fdb9f -
Trigger Event:
push
-
Statement type:
File details
Details for the file proker-4.3.1-py3-none-any.whl.
File metadata
- Download URL: proker-4.3.1-py3-none-any.whl
- Upload date:
- Size: 104.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6563ae650cc01b632ec69eb7e9bf39d6f5b9f04bd16d3284d867a4475079b437
|
|
| MD5 |
0c4959df7011033151f2e5dd447056c4
|
|
| BLAKE2b-256 |
459c834c3cb27b290ac1faad6b4773a20350ea6d7fa4223d24b09254d360681e
|
Provenance
The following attestation bundles were made for proker-4.3.1-py3-none-any.whl:
Publisher:
publish.yml on billy-ngo/proteomics-viewer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
proker-4.3.1-py3-none-any.whl -
Subject digest:
6563ae650cc01b632ec69eb7e9bf39d6f5b9f04bd16d3284d867a4475079b437 - Sigstore transparency entry: 1399195324
- Sigstore integration time:
-
Permalink:
billy-ngo/proteomics-viewer@26e7f0650eb7ecccb5b112fcb3395c6acf2fdb9f -
Branch / Tag:
refs/tags/v4.3.1 - Owner: https://github.com/billy-ngo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@26e7f0650eb7ecccb5b112fcb3395c6acf2fdb9f -
Trigger Event:
push
-
Statement type: