PRISM (Proteomics Reference-Integrated Signal Modeling) - RT-aware normalization for Skyline proteomics data
Project description
Skyline-PRISM
PRISM (Proteomics Reference-Integrated Signal Modeling) is a Python package for retention time-aware normalization of LC-MS proteomics data exported from Skyline, with robust protein quantification using a number of different methods.
Key Features
- Robust quantification with Tukey median polish: Can be used for both transition→peptide and peptide→protein rollups - automatically handles outliers without pre-identification
- Reference-anchored ComBat batch correction: Full implementation of empirical Bayes batch correction with automatic QC evaluation using reference/QC samples
- Dual-control validation: Uses intra-experiment QC samples to validate that corrections work without overfitting
- Sample outlier detection: Automatic detection of samples with abnormally low signal (failed injections, degradation) with optional exclusion
- Flexible protein inference: Multiple strategies for handling shared peptides (all_groups, unique_only, razor)
- Two-arm normalization pipeline: Separate paths for peptide-level and protein-level output, with batch correction applied at the appropriate level
- Local RT and Global peptide normalization: Normalizes to the median loess fit of the signal abundance across RT. Evaluated using QC and reference samples.
Installation
pip install skyline-prism
Or for development:
git clone https://github.com/maccoss/skyline-prism
cd skyline-prism
pip install -e ".[dev,viz]"
Input File Formats
PRISM accepts Skyline reports in three formats:
| Format | Extensions | Notes |
|---|---|---|
| Parquet | .parquet |
Fastest I/O, smaller files (recommended) |
| CSV | .csv |
Standard Skyline export format |
| TSV | .tsv, .txt |
Tab-separated variant |
Skyline 24.1+ can export reports directly to parquet via File > Export > Report (select parquet format). Parquet is 2-10x faster to read and 5-10x smaller than CSV equivalents.
Multiple input files can be provided in a single run (from different batches or plates):
# Multiple parquet reports (one per plate/batch)
prism run -i plate1.parquet plate2.parquet plate3.parquet -o output/ -c config.yaml
# Mixed formats from different instruments/labs
prism run -i site1.csv site2.parquet site3.tsv -o output/ -c config.yaml
Column names are automatically matched regardless of whether spaces are present (Fragment Ion vs FragmentIon), underscored (Fragment_Ion), or space-free (invariant export format). This handles all Skyline export variants.
Quick Start
Run the full pipeline
# Single parquet report (recommended format)
prism run -i skyline_report.parquet -o output_dir/ -c config.yaml -m sample_metadata.csv
# Single CSV report
prism run -i skyline_report.csv -o output_dir/ -c config.yaml -m sample_metadata.csv
# Multiple reports (merged and batch-corrected together)
prism run -i plate1.parquet plate2.parquet plate3.parquet -o output/ -c config.yaml -m metadata.csv
# Multiple metadata files are automatically merged
prism run -i data.parquet -o output/ -c config.yaml -m batch1_meta.csv batch2_meta.csv
This produces:
corrected_peptides.parquet- Peptide-level quantities (normalized, batch-corrected)corrected_proteins.parquet- Protein-level quantities (normalized, batch-corrected)peptides_rollup.parquet- Raw peptide abundances from transition rollup (before normalization)proteins_raw.parquet- Raw protein abundances from peptide rollup (before normalization)protein_groups.csv- Protein group definitionspeptide_residuals.parquet- Median polish residuals (for outlier analysis)metadata.json- Processing metadata and provenanceprism_run_YYYYMMDD_HHMMSS.log- Detailed log file with all processing steps and timingsqc_report.html- HTML QC report with embedded diagnostic plotsqc_plots/- Directory with QC plots (ifsave_plots: true)
Scale: Output files contain linear-scale abundances (raw peak area units). The pipeline operates on log2 scale internally but back-transforms to linear before writing output.
Key output columns in protein/peptide parquet files:
abundance- Linear-scale abundance (normalized, batch-corrected)abundance_raw- Original linear-scale abundance from Skylineuncertainty- Propagated uncertainty (log2 scale, for downstream analysis)cv_peptides- CV across peptides (protein-level quality metric)n_peptides- Number of peptides used in rollupqc_flag- QC warnings (e.g.,low_peptide_count(n),single_peptide_in_sample)
Protein Metadata (parsimony “leading_” semantics)
Protein groups are formed by parsimony. Each group has a canonical representative (“leading protein”) used to label group-level metadata. The following metadata columns appear in corrected_proteins.parquet:
protein_group: Protein group ID (PG####)leading_protein: Representative accession for the leading proteinleading_name: Representative protein name (from SkylineProtein)leading_uniprot_id: UniProt accession for the leading protein (from SkylineProtein Accession)leading_gene_name: Gene symbol for the leading protein (from SkylineProtein Gene)leading_description: Full protein description for the leading protein (from SkylineProtein)
The leading_ prefix clarifies that these fields describe the group's canonical representative, avoiding ambiguity with member/subsumed proteins. Member/subsumed lists are available in protein_groups.csv.
See SPECIFICATION.md for complete column definitions.
Reproducibility: Re-run from provenance
The metadata.json output contains the complete processing configuration, enabling reproducible re-runs:
# Re-run with exact same parameters on new data
prism run -i new_data.csv -o output2/ --from-provenance output1/metadata.json
# Override specific settings while keeping others from provenance
prism run -i new_data.csv -o output2/ --from-provenance output1/metadata.json -c overrides.yaml
Regenerate QC reports
To regenerate QC reports from an existing PRISM output directory (useful after visualization updates):
prism qc -d output_dir/
This reads the processed parquet files and regenerates the QC report without re-running the full pipeline.
Generate a configuration template
Generate an annotated configuration file template:
# Full template with all options documented
prism config-template -o config.yaml
# Minimal template with only common options
prism config-template --minimal -o config.yaml
Merge multiple Skyline reports
The prism merge command merges multiple reports into a single sorted parquet file without running the full pipeline. Accepts CSV, TSV, and parquet inputs in any combination:
# Merge CSV reports
prism merge report1.csv report2.csv -o unified_data.parquet -m sample_metadata.csv
# Merge parquet reports (faster)
prism merge plate1.parquet plate2.parquet plate3.parquet -o unified_data.parquet
# Mixed formats
prism merge site1.csv site2.parquet -o unified_data.parquet -m metadata.csv
Note: prism run also accepts multiple input files directly and performs the merge internally, so prism merge is mainly useful when you want the merged parquet for downstream use outside PRISM.
Compare rollup methods
Compare peptide CVs between two PRISM runs using different rollup methods. This is useful for evaluating whether library-assisted rollup is helping to reduce interference for specific peptides.
Workflow:
- Run PRISM twice with different rollup methods:
# Run 1: Sum method (baseline)
prism run -i data.parquet -o output_sum/ -c config_sum.yaml
# Run 2: Library-assist method
prism run -i data.parquet -o output_lib/ -c config_lib.yaml
- Compare the results:
prism compare -1 output_sum/ -2 output_lib/ -o comparison_report.html
Options:
| Flag | Description | Default |
|---|---|---|
-1, --run1 |
First output directory (baseline, e.g., sum method) | Required |
-2, --run2 |
Second output directory (comparison, e.g., library_assist) | Required |
-o, --output |
Output HTML report path | comparison_report.html |
-s, --sample-type |
Sample type to analyze (reference, qc, or all) |
reference |
-n, --top-n |
Number of top peptides to show in detail | 100 |
-r, --ranking |
How to rank peptides: most_improved, most_worsened, highest_cv, largest_difference |
most_improved |
--save-plots |
Save individual PNG plot files | False |
Example with all options:
prism compare \
-1 output_sum/ \
-2 output_lib/ \
-o comparison.html \
--sample-type reference \
--top-n 50 \
--ranking most_improved \
--save-plots
Required files in each output directory:
corrected_peptides.parquet- Used for CV calculationsmerged_data.parquet- Required in run2 for library fitting visualizationmetadata.json- Method names and sample metadata
Comparison report contents:
- Summary statistics comparing CV distributions between methods
- Top N peptides ranked by CV improvement
- For each peptide: detailed library fitting visualization showing:
- Raw transition intensities
- Library-scaled predictions
- Outlier detection and removal steps
- R² values for each sample
- Final abundance comparison
This helps identify peptides where library-assisted rollup successfully removes interfered transitions and improves quantification precision.
QC report contents
The QC report includes:
- Intensity distribution plots: Before/after normalization comparison
- PCA analysis: Visualize batch effects and sample clustering across processing stages
- Control sample correlation: Heatmaps showing reproducibility of reference and QC samples
- CV distributions: Technical variation in control samples before/after normalization
- RT correction QC: If RT-dependent normalization is enabled, shows residuals for reference (fitted) and QC (held-out validation) samples
All plots are saved as PNGs in output_dir/qc_plots/ and embedded in the HTML report.
Processing Pipeline
The pipeline follows a two-arm design with batch correction applied at the reporting level:
Stage 1: Merge reports (CSV/TSV/parquet, streaming)
│
▼
Stage 2: Transition → Peptide rollup (Tukey median polish)
│
▼
Stage 2b: Peptide Global Normalization (median or VSN)
│ [Optional: RT-aware correction - disabled by default]
▼
Stage 2c: Peptide ComBat Batch Correction
│
├─────────────────────────────────┐
│ │
▼ ▼
Stage 3: Protein Parsimony PEPTIDE OUTPUT ARM
│ (corrected_peptides.parquet)
▼
Stage 4: Peptide → Protein Rollup (median polish)
│
▼
Stage 4b: Protein Global Normalization (median)
│
▼
Stage 4c: Protein ComBat Batch Correction
│
▼
Stage 5: Output Generation
│
▼
PROTEIN OUTPUT ARM
(corrected_proteins.parquet)
│
▼
Stage 5b: QC Report Generation (HTML + plots)
Key Implementation Notes:
- Streaming CSV merge handles large datasets (~47GB tested) without loading into memory
- Merge-and-sort in single pass using DuckDB for efficient multi-file processing
- Vectorized library-assisted rollup using median polish (~10x speedup on large datasets)
- Pre-sorted optimization skips redundant sorting when data is already sorted
- Both peptide and protein outputs receive independent batch correction
- Automatic log files saved to output directory with timestamp for reproducibility
- RT correction disabled by default - search engine calibration may not generalize between samples
- Protein parsimony applied before protein rollup to avoid double-counting shared peptides
Experimental Design Requirements
For optimal results, include these controls in each batch:
| Control Type | Description | Replicates/Batch |
|---|---|---|
| Inter-experiment reference | Commercial pool (e.g., Golden West CSF) | 1-8 |
| Intra-experiment QC | Pooled experimental samples | 1-8 |
Note: In 96-well plate formats, controls are typically placed once per row (8 replicates per batch). Smaller experiments may have as few as 1 replicate per batch.
Plus internal QCs in all samples:
- Protein QC: Yeast enolase (16 ng/μg sample)
- Peptide QC: PRTC (30-150 fmol/injection)
Configuration
Generate a configuration template with all options documented:
prism config-template -o config.yaml # Full template with all options
prism config-template --minimal -o config.yaml # Minimal template
Key settings:
# Sample type detection patterns (for automatic sample type assignment)
sample_annotations:
reference_pattern: # Inter-experiment reference (e.g., commercial pools)
- "-Pool_"
- "-Pool"
qc_pattern: # Intra-experiment QC (e.g., pooled experimental samples)
- "-Carl_"
- "-Carl"
# Everything else is treated as experimental
# Transition to peptide rollup (if using transition-level data)
transition_rollup:
enabled: true
method: "adaptive" # adaptive (recommended), median_polish, or sum
min_transitions: 3
learn_adaptive_weights: true # Learn optimal weights from reference samples (default when method=adaptive)
# Sample outlier detection (one-sided, low signal only)
sample_outlier_detection:
enabled: true
action: "report" # "report" to log only, "exclude" to remove from analysis
method: "iqr" # IQR-based on LINEAR scale (not log)
iqr_multiplier: 1.5
# Global normalization (applied after peptide rollup)
global_normalization:
method: "median" # median (recommended) or vsn
# RT correction (optional, applied together with global normalization)
rt_correction:
enabled: false # DISABLED BY DEFAULT - search engine RT calibration may not generalize
method: "spline"
spline_df: 5
per_batch: true
# Batch correction (applied at both peptide and protein levels)
batch_correction:
enabled: true
method: "combat" # Full empirical Bayes implementation
# Protein parsimony (applied before protein rollup)
parsimony:
enabled: true
fasta_path: "/path/to/database.fasta" # FASTA used in search
shared_peptide_handling: "all_groups" # recommended
# Protein rollup (peptide → protein aggregation)
protein_rollup:
method: "sum" # default; alternatives: median_polish, topn, ibaq, maxlfq
min_peptides: 3 # Minimum peptides for method application (below uses sum)
# QC report generation
qc_report:
enabled: true
save_plots: true # Save individual PNG files
embed_plots: true # Embed plots in HTML report
Automatic Batch Estimation
If batch information is not provided in the metadata file, PRISM can automatically estimate batch assignments from acquisition times. Three estimation methods are available:
auto(default): Try gap detection first, fall back to fixed if no gaps foundfixed: Divide samples evenly into N batches by acquisition time ordergap: Only use gap detection (skip batch correction if no gaps found)
batch_estimation:
method: "auto" # "auto", "fixed", or "gap"
n_batches: null # Number of batches for "fixed" mode (or auto fallback)
gap_iqr_multiplier: 1.5 # For gap detection: lower = more sensitive
Example (divide into 5 artificial batches for drift correction):
batch_estimation:
method: "fixed"
n_batches: 5
To enable time-based batch estimation, include Result File > Acquired Time in your Skyline report. See SPECIFICATION.md for details.
Sample Outlier Detection
PRISM automatically detects samples with abnormally low signal intensity, which may indicate:
- Failed injections
- Sample degradation
- Instrument issues
- Empty wells
Key features:
- One-sided detection: Only flags samples with too little signal (not high outliers)
- Linear scale statistics: Detection uses linear scale (not log2) to avoid scale compression
- Two detection methods:
iqr: Flag samples below Q1 - 1.5×IQR (default)fold_median: Flag samples with median < X% of overall median
- Configurable action: Report only or exclude from analysis
sample_outlier_detection:
enabled: true
action: "report" # "report" or "exclude"
method: "iqr" # "iqr" or "fold_median"
iqr_multiplier: 1.5 # Samples below Q1 - 1.5*IQR flagged
Example log output:
Sample outlier detection (IQR method, linear scale):
Q1=1234567, Q3=2345678, IQR=1111111
Lower bound: 567890 (Q1 - 1.5*IQR)
Overall median: 1789012
OUTLIER: Sample_042 - median=24 (0.0% of overall median)
Tukey Median Polish (Default Rollup Method)
Both transition→peptide and peptide→protein rollups use Tukey median polish by default. This robust method:
- Automatically handles outliers: Interfered transitions or problematic peptides are downweighted through the median operation
- No pre-filtering required: Unlike quality-weighted methods, doesn't require explicit quality metrics
- Produces interpretable effects: Separates row effects from column effects (sample abundance)
- Handles missing values naturally: Works with incomplete matrices
What the row effects represent:
| Rollup Stage | Row Effects Represent | Column Effects Represent |
|---|---|---|
| Transition → Peptide | Transition interference (co-eluting analytes affecting specific transitions) | Peptide abundance per sample |
| Peptide → Protein | Peptide ionization efficiency (some peptides ionize better than others) | Protein abundance per sample |
Alternative Rollup Methods
While median polish is recommended, these alternative methods are available:
Transition → Peptide Rollup (when using transition-level data):
| Method | Description | Use Case |
|---|---|---|
sum |
Simple sum of intensities | Default, fast, straightforward |
consensus |
Inverse-variance weighted sum | Down-weights transitions that deviate from the cross-sample pattern |
median_polish |
Tukey median polish | Robust to outliers, produces residuals for QC |
adaptive |
Learned weighted average | When quality metrics (intensity, m/z, ShapeCorrelation) are available |
topn |
Top N by correlation/intensity | When a subset of transitions are known to be reliable |
library_assist |
Spectral library-based rollup | When you have a high-quality spectral library and want to detect/exclude interfered transitions |
Peptide → Protein Rollup:
| Method | Description | Use Case |
|---|---|---|
median_polish |
Tukey median polish (default) | General use, robust to outliers |
topn |
Mean of top N most intense peptides | Simple, when top peptides are reliable |
ibaq |
iBAQ (intensity / theoretical peptide count) | Absolute quantification across proteins |
maxlfq |
Maximum LFQ (MaxQuant-style) | When peptide ratios are more reliable than absolute values |
sum |
Simple sum of intensities | Fast but sensitive to outliers |
min_peptides Threshold:
The min_peptides setting controls the minimum number of peptides required for full-confidence method application:
protein_rollup:
method: "median_polish"
min_peptides: 3 # Default: 3
| Peptide Count | Method Used | low_confidence flag |
|---|---|---|
| 1 | Direct peptide value | True |
| < min_peptides | Sum (linear space) | True |
| >= min_peptides | Configured method | False |
Proteins below the threshold are still quantified using sum in linear space (consistent with median_polish output scale) and flagged with low_confidence=True.
iBAQ (Intensity-Based Absolute Quantification)
iBAQ normalizes protein abundances by the number of theoretical peptides, enabling comparison of absolute abundance across proteins:
protein_rollup:
method: "ibaq"
ibaq:
fasta_path: "/path/to/database.fasta" # Required for iBAQ
enzyme: "trypsin" # Must match search settings
missed_cleavages: 0 # Typically 0 for counting
from skyline_prism.fasta import get_theoretical_peptide_counts
# Calculate theoretical peptide counts for iBAQ
counts = get_theoretical_peptide_counts(
"/path/to/database.fasta",
enzyme="trypsin",
missed_cleavages=0,
)
Adaptive Rollup with Learned Weights
For transition→peptide rollup, the adaptive method learns optimal weighting parameters from reference samples to minimize CV:
transition_rollup:
method: "adaptive"
learn_adaptive_weights: true
adaptive_rollup:
beta_mz: 0.0 # Starting value (optimized automatically)
beta_shape_corr_outlier: 0.0 # Starting value (optimized automatically)
shape_corr_low_threshold: 0.7 # Threshold for "low" shape correlation
min_improvement_pct: 0.1 # Required improvement over sum to use adaptive weights
The weight formula:
w_t = exp(beta_mz * normalized_mz + beta_shape_outlier * outlier_frac)
Where:
normalized_mz: m/z normalized to [0,1] rangeoutlier_frac: Fraction of samples where shape correlation < threshold (indicates interference)
Key insight: When all betas are 0, all weights equal 1 (simple sum baseline). The optimizer only uses learned weights if they improve CV.
What gets optimized:
beta_mz: Higher m/z fragments may have better signal (positive = favor high m/z)beta_shape_corr_outlier: Transitions with frequent interference (low shape correlation) should be down-weighted (negative = penalize)
The peptide abundance calculation:
Peptide_abundance = log2(Σ weight_t × intensity_t)
The transition intensities are the VALUES being summed. The learned weights adjust how much each transition contributes based on its quality metrics (m/z and interference level).
Learning process:
- Parameters are optimized on reference samples by minimizing median CV
- Results are validated on QC samples (held-out) to prevent overfitting
- Automatic fallback to simple sum if adaptive doesn't improve CV by
min_improvement_pct
Consensus Rollup with Cross-Sample Weighting
The consensus method uses cross-sample consistency to weight transitions. It assumes that all transitions from a peptide should show the same fold-change pattern across samples - transitions that deviate from this consensus are down-weighted.
transition_rollup:
method: "consensus"
min_transitions: 3
Algorithm:
- Model:
log2(T_ij) = α_i (transition offset) + β_j (sample abundance) + ε_ij - Estimate initial β using row-centered medians (robust to outliers)
- Calculate residuals:
ε_ij = T_ij - α_i - β_j - Weight transitions:
w_i = 1 / (variance(ε_i) + regularization) - Return weighted sum in linear space
Key insight: Transitions with high cross-sample residual variance are likely interfered. This method requires no external data (no spectral library or shape correlation) - it learns weights purely from the data itself.
When to use: Best for DDA data or when quality metrics like shape correlation are not available.
Library-Assisted Rollup with Spectral Library
For transition->peptide rollup, the library_assist method uses a spectral library to detect and exclude interfered transitions. This is particularly effective when co-eluting peptides contribute signal to specific fragments.
transition_rollup:
method: "library_assist"
library_assist:
library_path: "/path/to/library.blib" # or .tsv (Carafe format)
fitting_method: "median_polish" # RECOMMENDED (or "least_squares")
mz_tolerance: 0.02 # m/z tolerance for matching (Da)
min_matched_fragments: 3 # Minimum fragments for valid fit
outlier_threshold: 1.0 # 1.0 = obs > 2x predicted is outlier
Two fitting methods available:
| Method | Description | When to Use |
|---|---|---|
median_polish |
Uses MEDIAN to estimate scale; robust to 1-2 outliers automatically | Recommended for most data |
least_squares |
Classic OLS fitting; more sensitive to outliers | Very clean data |
Algorithm (median_polish, default):
- Match observed transitions to library fragments by m/z
- Model:
log(observed) = log(library) + scale_factor + noise - Estimate scale via MEDIAN of log-ratios (robust to up to 50% outliers)
- Flag transitions with HIGH positive residuals as interfered (signal > expected)
- Remove worst outlier and refit until convergence
- Final abundance = exp(scale_factor) * sum(ALL library intensities)
Key insight: Only high residuals indicate interference - because interference can only add signal, never remove it. Low or negative residuals are valid (low abundance or noise).
Supported library formats:
- BLIB (Skyline): SQLite-based format with zlib-compressed peak arrays
- Carafe TSV (DIA-NN): Tab-separated format with fragment annotations
Performance: The implementation uses vectorized least squares (BLAS matrix operations) to process all samples in parallel, providing ~10x speedup on large datasets.
Note: For very large cohorts (hundreds to thousands of samples), consider directLFQ which uses a different "intensity trace" algorithm with linear runtime scaling. Our maxlfq implementation uses the original pairwise ratio approach which scales quadratically with sample count.
ComBat Batch Correction
PRISM includes a full implementation of the ComBat algorithm (Johnson et al. 2007) for empirical Bayes batch correction:
from skyline_prism import combat, combat_from_long, combat_with_reference_samples
# For wide-format data (features × samples)
corrected = combat(data, batch, covar_mod=covariates)
# For long-format data (as used in PRISM pipeline)
corrected_df = combat_from_long(
data,
feature_col='peptide_modified',
sample_col='replicate_name',
batch_col='batch',
abundance_col='abundance'
)
# With automatic evaluation using reference/QC samples
result = combat_with_reference_samples(
data,
sample_type_col='sample_type',
# ... other parameters
)
Key features:
- Parametric and non-parametric prior options
- Reference batch support (adjust other batches to match reference)
- Mean-only correction option (for unequal batch sizes)
- Automatic evaluation using reference and QC sample CVs
Residual Analysis for Proteoform Discovery
Median polish produces residuals that capture deviations from the additive model. Following Plubell et al. 2022, peptides with large residuals should not be automatically discarded - they may indicate biologically interesting variation:
- Proteoform differences: Different forms of the same protein (splice variants, truncations)
- Post-translational modifications: PTMs affecting specific peptides
- Protein processing: Cleavage products (e.g., amyloid-beta from APP)
- Technical outliers: Interference or poor peak picking
Accessing residuals:
from skyline_prism import rollup_to_proteins, extract_peptide_residuals
# Protein rollup returns median polish results and topn results
protein_df, polish_results, topn_results = rollup_to_proteins(data, protein_groups)
# Extract residuals in long format for output
peptide_residuals = extract_peptide_residuals(polish_results)
# Columns include:
# - protein_group_id, peptide, replicate_name
# - residual: raw residual for each peptide/sample
# - residual_mad: robust measure of peptide's overall deviation
# - residual_max_abs: maximum deviation across samples
# Find peptides that consistently deviate
outliers = peptide_residuals.groupby(['protein_group_id', 'peptide']).agg({
'residual_mad': 'first'
}).reset_index()
outliers = outliers[outliers['residual_mad'] > 0.5] # Threshold of your choice
Core Rollup API:
For advanced use cases, the rollup_protein_matrix function provides direct access to the core protein rollup logic:
from skyline_prism import rollup_protein_matrix
import pandas as pd
# Create a peptide x sample matrix (log2 scale)
matrix = pd.DataFrame({
'Sample1': [12.5, 14.2, 13.1],
'Sample2': [12.8, 14.5, 13.4],
}, index=['PeptideA', 'PeptideB', 'PeptideC'])
# Roll up to protein-level abundance
result = rollup_protein_matrix(
matrix,
method='median_polish',
min_peptides=3,
)
# Result contains:
# - result.abundances: pd.Series of sample abundances (log2)
# - result.residuals: dict of peptide -> sample -> residual
# - result.polish_result: MedianPolishResult (for median_polish method)
# - result.topn_result: TopNResult (for topn method)
For transition-level residuals:
from skyline_prism import rollup_transitions_to_peptides, extract_transition_residuals
# Use median_polish method for transition rollup
result = rollup_transitions_to_peptides(data, method='median_polish')
# Extract transition residuals
transition_residuals = extract_transition_residuals(result)
Shared Peptide Handling
Three strategies available:
-
all_groups(default, recommended): Apply shared peptides to ALL protein groups. Acknowledges proteoform complexity; avoids assumptions based on FASTA annotations. -
unique_only: Only use peptides unique to a single protein group. Most conservative. -
razor: Assign shared peptides to group with most peptides (MaxQuant-style).
Python API
from skyline_prism import (
# Data I/O
load_skyline_report,
merge_skyline_reports,
load_sample_metadata,
# Normalization
normalize_pipeline,
rt_correction_from_reference,
# Rollup
tukey_median_polish,
rollup_protein_matrix, # Core protein rollup function
rollup_to_proteins,
rollup_transitions_to_peptides,
# Batch correction
combat,
combat_from_long,
combat_with_reference_samples,
# Protein inference
compute_protein_groups,
# Validation
validate_correction,
generate_qc_report,
# Visualization
plot_intensity_distribution,
plot_normalization_comparison,
plot_pca,
plot_comparative_pca,
plot_control_correlation_heatmap,
plot_cv_distribution,
plot_comparative_cv,
plot_sample_correlation_matrix,
)
Data Visualization
PRISM provides visualization functions for QC assessment and normalization evaluation. Install visualization dependencies with:
pip install skyline-prism[viz]
Intensity Distribution
Compare sample intensity distributions before/after normalization:
from skyline_prism import plot_intensity_distribution, plot_normalization_comparison
# Box plot of intensity distributions
plot_intensity_distribution(
data,
sample_types={"Sample_1": "reference", "Sample_2": "qc", ...},
title="Intensity Distribution"
)
# Side-by-side comparison of normalization effect
plot_normalization_comparison(
data_before=raw_data,
data_after=normalized_data,
title="Normalization Effect"
)
PCA Analysis
Visualize batch effects and normalization impact with PCA:
from skyline_prism import plot_pca, plot_comparative_pca
# Single PCA plot with sample grouping
fig, pca_df = plot_pca(
data,
sample_groups={"Sample_1": "Batch_A", "Sample_2": "Batch_B", ...}
)
# Comparative PCA: Original → Normalized → Batch Corrected
plot_comparative_pca(
data_original=raw_data,
data_normalized=normalized_data,
data_batch_corrected=corrected_data, # Optional
sample_groups=sample_batches,
figsize=(18, 6)
)
Control Sample Correlation
Assess reproducibility using correlation heatmaps for control samples:
from skyline_prism import plot_control_correlation_heatmap, plot_sample_correlation_matrix
# Correlation heatmap for reference and QC samples only
fig, corr_matrix = plot_control_correlation_heatmap(
data,
sample_type_col="sample_type",
control_types=["reference", "qc"],
method="pearson"
)
# Full sample correlation matrix
fig, corr_matrix = plot_sample_correlation_matrix(
data,
sample_types=sample_type_dict
)
CV Distribution
Evaluate precision using CV distributions for control samples:
from skyline_prism import plot_cv_distribution, plot_comparative_cv
# CV distribution histogram for controls
fig, cv_data = plot_cv_distribution(
data,
sample_type_col="sample_type",
control_types=["reference", "qc"],
cv_threshold=20.0
)
# Compare CV before/after normalization
plot_comparative_cv(
data_before=raw_data,
data_after=normalized_data,
sample_type_col="sample_type",
control_type="reference"
)
RT Correction Quality Assessment
Visualize RT-dependent correction effectiveness. This is critical for validating that corrections learned from reference samples generalize to held-out QC samples:
from skyline_prism import plot_rt_correction_comparison, plot_rt_correction_per_sample
# 2×2 comparison showing reference (fitted) vs QC (held-out validation)
# before and after RT correction
fig, axes = plot_rt_correction_comparison(
data_before=data_before_correction,
data_after=data_after_correction,
sample_type_col="sample_type",
reference_mean=reference_mean_df, # Mean abundance per peptide from reference
figsize=(14, 10)
)
# Per-sample before/after comparison
fig, axes = plot_rt_correction_per_sample(
data_before=data_before_correction,
data_after=data_after_correction,
sample_type_col="sample_type",
reference_mean=reference_mean_df,
samples_per_type=3, # Number of samples to show per type
figsize=(16, 12)
)
The RT correction plots help assess:
- Whether the spline model captures RT-dependent variation in reference samples
- Whether corrections generalize to QC samples (held-out validation)
- Whether any samples have unusually large residuals after correction
All visualization functions support:
show_plot=Falseto return the figure object for further customizationsave_path="/path/to/figure.png"to save directly to file
Citation
If you use Skyline-PRISM, please cite:
Tsantilas KA et al. "A framework for quality control in quantitative proteomics." J Proteome Res. 2024. DOI: 10.1021/acs.jproteome.4c00363
Related Projects
License
MIT License - see LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skyline_prism-26.3.2.tar.gz.
File metadata
- Download URL: skyline_prism-26.3.2.tar.gz
- Upload date:
- Size: 2.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f781bf2bbc238cbba19b8eb87a3af73979536d1b5595d59371ab661cfbe7901d
|
|
| MD5 |
380182507ca0fd181b9d4d61d56a0504
|
|
| BLAKE2b-256 |
96918968fb89086a0c95f323267e8e759d4be9c34381dddf64cac4acd4086401
|
Provenance
The following attestation bundles were made for skyline_prism-26.3.2.tar.gz:
Publisher:
release.yml on maccoss/skyline-prism
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skyline_prism-26.3.2.tar.gz -
Subject digest:
f781bf2bbc238cbba19b8eb87a3af73979536d1b5595d59371ab661cfbe7901d - Sigstore transparency entry: 1020878566
- Sigstore integration time:
-
Permalink:
maccoss/skyline-prism@174d9a6ac69a41c4a66d86d707843a4e9c9935ca -
Branch / Tag:
refs/tags/v26.3.2 - Owner: https://github.com/maccoss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@174d9a6ac69a41c4a66d86d707843a4e9c9935ca -
Trigger Event:
release
-
Statement type:
File details
Details for the file skyline_prism-26.3.2-py3-none-any.whl.
File metadata
- Download URL: skyline_prism-26.3.2-py3-none-any.whl
- Upload date:
- Size: 232.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c00a997a3640d72de93a4898e07e0e6beac47a980ce94e0f48226e3e79e81442
|
|
| MD5 |
db1da1da4e4c654d88dede9d41d2ecda
|
|
| BLAKE2b-256 |
ae06684f45d21da994038f4c2a4a0d7b98124e0c12b94dca91347cb362de5a9d
|
Provenance
The following attestation bundles were made for skyline_prism-26.3.2-py3-none-any.whl:
Publisher:
release.yml on maccoss/skyline-prism
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
skyline_prism-26.3.2-py3-none-any.whl -
Subject digest:
c00a997a3640d72de93a4898e07e0e6beac47a980ce94e0f48226e3e79e81442 - Sigstore transparency entry: 1020878616
- Sigstore integration time:
-
Permalink:
maccoss/skyline-prism@174d9a6ac69a41c4a66d86d707843a4e9c9935ca -
Branch / Tag:
refs/tags/v26.3.2 - Owner: https://github.com/maccoss
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@174d9a6ac69a41c4a66d86d707843a4e9c9935ca -
Trigger Event:
release
-
Statement type: