Python library for PRIDE Affinity Proteomics (PAD) archive data

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

pyprideap

Python PRIDE Affinity Proteomics (pyprideap), a library for reading, validating, and analyzing affinity proteomics datasets from the PRIDE Affinity Archive (PAD).

Supports Olink (Explore, Explore HT, Target, Reveal) and SomaScan platforms.

Installation

Install pyprideap directly from PyPI:

pip install pyprideap

Or from source:

pip install "pyprideap[all] @ git+https://github.com/PRIDE-Archive/pyprideap.git"

With plotting and QC report support:

pip install "pyprideap[plots]"

With statistical testing:

pip install "pyprideap[all]"

Quick Start

Read a dataset

import pyprideap as pp

# Auto-detect format from file extension and content
dataset = pp.read("olink_npx.csv")
dataset = pp.read("raw_data.adat")
dataset = pp.read("data.parquet")

# Force platform when auto-detection is ambiguous
dataset = pp.read("ambiguous.csv", platform="olink")
dataset = pp.read("ambiguous.csv", platform="somascan")

Generate a QC report

dataset = pp.read("olink_npx.csv")
pp.qc_report(dataset, "my_report.html")

The report includes interactive plots: expression distributions, PCA/t-SNE, LOD analysis, sample correlation, data completeness, CV distributions, and more. All plots are rendered with Plotly and include help tooltips explaining how to interpret each visualization.

Validate against PRIDE-AP guidelines

results = pp.validate(dataset)

for r in results:
    print(f"[{r.level.value}] {r.rule}: {r.message}")

Compute statistics

stats = pp.compute_stats(dataset)
print(stats.summary())

Fetch data from PRIDE Archive

client = pp.PrideClient()
project = client.get_project("PAD000001")
files = client.list_files("PAD000001")
urls = client.get_download_urls("PAD000001")

Command-Line Interface

pyprideap includes a CLI (powered by Click) for generating QC reports:

# From a local file (format auto-detected)
pyprideap report data.npx.csv
pyprideap report data.parquet -o my_report.html

# Force platform type
pyprideap report data.csv -p olink
pyprideap report data.adat -p somascan

# From a PRIDE accession (downloads data automatically)
pyprideap report -a PAD000001

# Generate individual plot files instead of a single report
pyprideap report data.npx.csv --split -o plots_dir/

# Include SDRF metadata for volcano plots
pyprideap report data.npx.csv --sdrf samples.sdrf.tsv

# Enable verbose logging (shows format detection, LOD method, PCA variance, etc.)
pyprideap report data.npx.csv -v

# List proteins above LOD from a local file
pyprideap proteins-above-lod data.npx.csv
pyprideap proteins-above-lod data.npx.csv -t 80 -o proteins.txt

# List proteins above LOD from a PRIDE accession
pyprideap proteins-above-lod -a PAD000001

Or via python -m:

python -m pyprideap report data.npx.csv

Verbose mode

Use -v / --verbose to enable detailed debug logging. This shows progress through each processing stage:

Reading olink_npx.csv...
08:12:01 [DEBUG] pyprideap.io.readers.registry: Format detected: olink_csv
08:12:01 [DEBUG] pyprideap.io.readers.olink_csv: Sample key selected: SampleID
08:12:01 [DEBUG] pyprideap.io.readers.olink_csv: Pivot shape: 20 samples x 1470 features
  20 samples, 1470 features (olink_explore)
08:12:01 [DEBUG] pyprideap.processing.lod: LOD method selected: REPORTED
08:12:02 [DEBUG] pyprideap.viz.qc.compute: Computing PCA...
08:12:02 [DEBUG] pyprideap.viz.qc.compute: PCA: variance explained=[0.42, 0.18]
...

QC Report

The HTML report is a self-contained, interactive document with a sidebar table of contents. It includes:

Section	Plots
Quality Overview	LOD source comparison, QC x LOD stacked bar
Signal & Distribution	Per-sample expression histograms, protein detectability
Data Completeness	Per-sample above/below LOD, missing frequency distribution
Sample Relationships	PCA / t-SNE (dropdown toggle), sample correlation heatmap, clustered expression heatmap
Normalization QC	Hybridization control scale (SomaScan)
Variability	CV distribution, intra/inter-plate CV

Each plot has a ? help button with guidance on interpretation.

Embedding reports in web pages

Reports automatically detect when loaded inside an <iframe> and switch to an embedded mode that hides the header, sidebar, and footer:

<iframe
  src="my_report.html"
  style="width: 100%; border: none; min-height: 600px;"
  id="qc-report">
</iframe>

<script>
// Auto-resize iframe to fit content
window.addEventListener('message', function(e) {
  if (e.data && e.data.type === 'pride-qc-resize') {
    document.getElementById('qc-report').style.height = e.data.height + 'px';
  }
});
</script>

The embedded report posts pride-qc-resize messages with the document height, allowing the parent page to resize the iframe automatically. The CSS class pride-embedded is added to the body, which:

Removes the sidebar navigation, header, and footer
Makes the background transparent
Removes card shadows for a seamless look

SDRF Integration

pyprideap can read SDRF (Sample and Data Relationship Format) files and merge sample metadata into datasets:

from pyprideap.io.readers.sdrf import read_sdrf, merge_sdrf, get_grouping_columns

# Read and parse an SDRF file
sdrf = read_sdrf("samples.sdrf.tsv")

# Merge SDRF metadata into an existing dataset
dataset = pp.read("olink_npx.csv")
dataset = merge_sdrf(dataset, sdrf)

# Identify columns suitable for differential expression grouping
group_cols = get_grouping_columns(sdrf)
# e.g. ["disease", "sex", "treatment"]

Column names are automatically shortened from the full SDRF syntax (e.g. characteristics[disease] becomes disease). Duplicate column names are disambiguated with numeric suffixes.

Supported File Formats

Format	Platform	Function
`.npx.csv`	Olink Explore / Target	`pp.read()`
`.parquet`	Olink Explore HT	`pp.read()`
`.xlsx`	Olink	`pp.read()`
`.adat`	SomaScan	`pp.read()`
`.csv` (SomaScan)	SomaScan	`pp.read()`
`.sdrf.tsv`	Any	`read_sdrf()`

All readers produce an AffinityDataset with a unified structure regardless of input format.

Data Model

@dataclass
class AffinityDataset:
    platform: Platform          # OLINK_EXPLORE, OLINK_EXPLORE_HT, SOMASCAN, etc.
    samples: pd.DataFrame       # Sample metadata (SampleID, SampleType, QC flags, ...)
    features: pd.DataFrame      # Protein/aptamer annotations (OlinkID, UniProt, Panel, ...)
    expression: pd.DataFrame    # Quantification matrix (NPX or RFU)
    metadata: dict              # Platform-specific extras

LOD (Limit of Detection)

pyprideap supports multiple LOD sources with automatic fallback:

Reported LOD — from the LOD column in the data file
NCLOD — computed from negative control samples (requires >= 10 controls)
FixedLOD — pre-computed Olink reference values (bundled for Explore, Explore HT, Reveal)
eLOD — estimated from buffer samples using MAD formula (SomaScan)

Statistical Analysis

With pip install "pyprideap[stats]":

# Per-protein t-test between groups
results = pp.ttest(dataset, group_var="Treatment")

# Wilcoxon rank-sum test
results = pp.wilcoxon(dataset, group_var="Treatment")

# ANOVA with covariates
results = pp.anova(dataset, group_var="Treatment", covariates=["Age", "Sex"])

# Post-hoc pairwise comparisons
posthoc = pp.anova_posthoc(dataset, group_var="Treatment")

Normalization

# Bridge normalization (combining two runs with shared samples)
normalized = pp.bridge_normalize(dataset1, dataset2, bridge_samples=["S1", "S2"])

# Subset normalization using reference proteins
normalized = pp.subset_normalize(dataset1, dataset2, reference_proteins=["P1", "P2"])

# Reference median normalization
normalized = pp.reference_median_normalize(dataset, reference_medians=medians)

# Select optimal bridge samples
bridges = pp.select_bridge_samples(dataset, n=8)

# Assess bridgeability between product versions
report = pp.assess_bridgeability(dataset1, dataset2)

Additional normalization methods are available via direct import:

from pyprideap.processing.normalization import (
    lift_somascan,                # Cross-version SomaScan calibration (5k ↔ 7k ↔ 11k)
    quantile_smooth_normalize,    # Quantile normalization with smoothing
    scale_analytes,               # Per-analyte multiplicative scaling
    normalize_n,                  # Multi-step normalization pipeline
)

Preprocessing Pipelines

Platform-specific preprocessing pipelines bundle common QC and filtering steps:

from pyprideap.processing.olink import preprocess_olink
from pyprideap.processing.somascan import preprocess_somascan

# Olink: filter controls, detect outliers, LOD filtering, UniProt dedup
dataset, report = preprocess_olink(
    dataset,
    filter_controls=True,
    filter_qc_outliers=True,
    filter_lod=False,
)

# SomaScan: filter features/controls, RowCheck QC, outlier detection
dataset, report = preprocess_somascan(
    dataset,
    filter_features=True,
    filter_controls=True,
    filter_rowcheck=True,
)

print(report.summary())

Experimental Design

# Randomize samples to plates
plate_assignment = pp.randomize_plates(
    samples=sample_df,
    n_plates=4,
    keep_paired="SubjectID",  # keep longitudinal samples on same plate
    seed=42,
)

License

Apache License 2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ypriverol

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.0

Mar 19, 2026

1.0.0

Mar 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprideap-1.1.0.tar.gz (2.6 MB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyprideap-1.1.0-py3-none-any.whl (2.7 MB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file pyprideap-1.1.0.tar.gz.

File metadata

Download URL: pyprideap-1.1.0.tar.gz
Upload date: Mar 19, 2026
Size: 2.6 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyprideap-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2dcf56e29158b069bda72278f03ad52a9f0129a4ac54a02a87496e3202df6e53`
MD5	`5abe9bd69813240dd5e98d83031787ae`
BLAKE2b-256	`a999a8509fa7a30df10cb1744014356916b81ff1ade42ad1288c05f80831d2d4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprideap-1.1.0.tar.gz:

Publisher: publish.yml on PRIDE-Archive/pyprideap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyprideap-1.1.0.tar.gz
- Subject digest: 2dcf56e29158b069bda72278f03ad52a9f0129a4ac54a02a87496e3202df6e53
- Sigstore transparency entry: 1135572149
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: PRIDE-Archive/pyprideap@6c5f422e3a9c03c92d4ecf5fafabbee09af93f30
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/PRIDE-Archive
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c5f422e3a9c03c92d4ecf5fafabbee09af93f30
- Trigger Event: release

File details

Details for the file pyprideap-1.1.0-py3-none-any.whl.

File metadata

Download URL: pyprideap-1.1.0-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 2.7 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyprideap-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`12bc3fa76fe8d81c32c2e3f55254e9ffa0dcbfd1aeea7b1ae2c5565e83514057`
MD5	`8e6d56dd1af0526418c708a0485ace6e`
BLAKE2b-256	`bd39a929f22a0aecf3503e68aec36e238ff384c3482d36c88c75db2a552df795`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyprideap-1.1.0-py3-none-any.whl:

Publisher: publish.yml on PRIDE-Archive/pyprideap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyprideap-1.1.0-py3-none-any.whl
- Subject digest: 12bc3fa76fe8d81c32c2e3f55254e9ffa0dcbfd1aeea7b1ae2c5565e83514057
- Sigstore transparency entry: 1135572183
- Sigstore integration time: Mar 19, 2026
Source repository:
- Permalink: PRIDE-Archive/pyprideap@6c5f422e3a9c03c92d4ecf5fafabbee09af93f30
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/PRIDE-Archive
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6c5f422e3a9c03c92d4ecf5fafabbee09af93f30
- Trigger Event: release

pyprideap 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pyprideap

Installation

Quick Start

Read a dataset

Generate a QC report

Validate against PRIDE-AP guidelines

Compute statistics

Fetch data from PRIDE Archive

Command-Line Interface

Verbose mode

QC Report

Embedding reports in web pages

SDRF Integration

Supported File Formats

Data Model

LOD (Limit of Detection)

Statistical Analysis

Normalization

Preprocessing Pipelines

Experimental Design

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance