Skip to main content

Pure-Python port of Olink Proteomics' R OlinkAnalyze — NPX I/O, bridge normalization, and per-protein differential expression for Olink proteomics.

Project description

pyolinkanalyze

A pure-Python port of R OlinkAnalyze (Olink Proteomics AB) — 100 % coverage of the OlinkAnalyze 3.8.2 public API: NPX I/O (CSV / TSV / Excel), bridge / subset / N-way normalization, per-protein differential expression (t-test, Wilcoxon, LMM, ANOVA, Kruskal-Wallis / Friedman, ordinal regression, plus post-hoc contrasts), limit-of-detection handling, plate randomization, plate-layout / distribution plots, pathway enrichment, and a full set of matplotlib plots.

  • No rpy2, no R install. Welch t-test via scipy.stats.ttest_ind(equal_var=False), Mann-Whitney via scipy.stats.mannwhitneyu(use_continuity=True), LMM via statsmodels.regression.mixed_linear_model.MixedLM, type-III ANOVA via statsmodels + sum-to-zero contrasts, ordinal regression via statsmodels.miscmodels.ordinal_model.OrderedModel.
  • Tidy long-format pandas.DataFrame interface — the same NPX schema Olink ships in their Explore / Target CSVs.
  • R-parity tests against OlinkAnalyze 3.8.2 — Pearson r > 0.99 (often =1.0) on per-protein test statistics and p-values for t-test, Wilcoxon, LMM, ANOVA and Kruskal-Wallis.

This is a standalone mirror of the canonical implementation that lives in omicverse. All algorithmic work is developed upstream in omicverse and synced here.

Install

pip install pyolinkanalyze

Dependencies: numpy, scipy, pandas, statsmodels. Plotting needs matplotlib + scikit-learn (pip install pyolinkanalyze[plotting]); olink_umap_plot optionally uses umap-learn (pip install pyolinkanalyze[umap]) and falls back to PCA otherwise.

Quick-start

import pyolinkanalyze as pa

# Load Olink long-format NPX CSV (auto-detects ; vs , separators)
npx = pa.read_npx_csv("study_NPX_2024.csv")

# Differential expression: two-group Welch t-test per protein
res = pa.olink_ttest(npx, variable="Treatment")
res.head()
# OlinkID  Assay     UniProt  term            estimate  statistic  p.value   Adjusted_pval
# OID00012 IL6       P05231   group1 - group0    1.84    5.12      1.2e-5    8.6e-4
# ...

# Non-parametric alternative
res_w = pa.olink_wilcox(npx, variable="Treatment")

# Linear mixed-effects: NPX ~ Treatment + (1|Subject), per protein
res_lmm = pa.olink_lmer(npx, variable="Treatment", random="Subject")

# Bridge normalization across two batches (4 overlapping samples)
df_ref = pa.read_npx_csv("batch_A.csv")
df_target = pa.read_npx_csv("batch_B.csv")
joined = pa.olink_normalization(
    df_ref, df_target,
    overlapping_samples_df1=["B01", "B02", "B03", "B04"],
    overlapping_samples_df2=["B01", "B02", "B03", "B04"],
)

More tests (v0.2):

# Multi-group ANOVA + Tukey post-hoc
res_av = pa.olink_anova(npx, variable="Group")
res_ph = pa.olink_anova_posthoc(npx, variable="Group", effect="Group")

# Non-parametric (Kruskal-Wallis) + Dunn post-hoc
res_kw = pa.olink_one_non_parametric(npx, variable="Group")
res_dunn = pa.olink_one_non_parametric_posthoc(npx, variable="Group")

# Ordinal regression
res_ord = pa.olink_ordinal_regression(npx, variable="Group")

# Limit of detection (negative-control estimate) + below-LOD flags
npx_lod = pa.olink_lod(npx, lod_method="NCLOD")

# Pick optimal bridging samples
bridges = pa.olink_bridge_selector(npx, sample_missing_freq=0.1, n=8)

# Randomize a sample manifest across plates
plated = pa.olink_plate_randomizer(manifest, subject_col="Subject", seed=0)

# Pathway enrichment on a DE result
gene_sets = pa.read_gmt("hallmark.gmt")
enr = pa.olink_pathway_enrichment(res, gene_sets, method="gsea")

Plotting helpers:

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 4))
pa.olink_volcano_plot(res, ax=axes[0])
pa.olink_qc_plot(npx, ax=axes[1])

# v0.2 plots
pa.olink_pca_plot(npx, color_by="Treatment")
pa.olink_heatmap_plot(npx)
pa.olink_boxplot(npx, "Treatment", olinkids=["OID00012"])
pa.olink_pathway_heatmap(enr)

# v0.2.1 — plate QC plots + general NPX reader
plated = pa.olink_plate_randomizer(manifest, seed=0)
pa.olink_display_plate_distributions(plated, fill_color="Treatment")
pa.olink_display_plate_layout(plated, color_by="Treatment")
npx = pa.read_npx("study_NPX_2024.xlsx")   # dispatches CSV / TSV / Excel

API coverage (v0.2.1)

100 % of the R OlinkAnalyze 3.8.2 public API is ported. The only names not mapped to Python functions are %>% (the R pipe) and manifest / npx_data1 / npx_data2 (bundled example datasets) — these are not functions.

I/O & normalization

Python R counterpart
read_npx read_NPX (dispatches CSV / TSV / Excel) ✅
read_npx_csv read_NPX (long-format CSV path) ✅
read_npx_excel read_NPX (.xlsx / .xls Olink export) ✅
olink_normalization olink_normalization (bridge, difference-of-medians)
olink_normalization_reference_medians olink_normalization(reference_medians=…)
olink_normalization_bridge olink_normalization_bridge (paired median-of-diffs)
olink_normalization_subset olink_normalization_subset
olink_normalization_n olink_normalization_n (N-way chain / tree)
olink_bridge_selector olink_bridgeselector

Statistical tests & post-hoc

Python R counterpart
olink_ttest olink_ttest (paired support)
olink_wilcox olink_wilcox
olink_lmer olink_lmer
olink_lmer_posthoc olink_lmer_posthoc (Wald pairwise contrasts)
olink_anova olink_anova (type-III, contr.sum)
olink_anova_posthoc olink_anova_posthoc (Tukey HSD)
olink_one_non_parametric olink_one_non_parametric (Kruskal / Friedman)
olink_one_non_parametric_posthoc olink_one_non_parametric_posthoc (Dunn / paired Wilcoxon)
olink_ordinal_regression olink_ordinalRegression
olink_ordinal_regression_posthoc olink_ordinalRegression_posthoc

LOD, study design & pathway

Python R counterpart
olink_lod olink_lod (NCLOD / FixedLOD)
olink_plate_randomizer olink_plate_randomizer
olink_pathway_enrichment olink_pathway_enrichment (self-contained GSEA / ORA)
read_gmt (helper — load gene sets)

Plotting (matplotlib)

Python R counterpart
olink_volcano_plot olink_volcano_plot
olink_qc_plot olink_qc_plot
olink_boxplot olink_boxplot
olink_dist_plot olink_dist_plot
olink_pca_plot olink_pca_plot (sklearn.decomposition.PCA)
olink_umap_plot olink_umap_plot (umap-learn, PCA fallback)
olink_heatmap_plot olink_heatmap_plot
olink_lmer_plot olink_lmer_plot
olink_pathway_heatmap olink_pathway_heatmap
olink_pathway_visualization olink_pathway_visualization
olink_display_plate_distributions olink_displayPlateDistributions
olink_display_plate_layout olink_displayPlateLayout
olink_pal, set_plot_theme, olink_color_discrete, olink_fill_discrete, olink_color_gradient, olink_fill_gradient same names

Not Python functions

R name Reason
%>% R magrittr pipe — a language operator, not a function to port
manifest, npx_data1, npx_data2 bundled example datasets, not functions

Every other function in R OlinkAnalyze 3.8.2 has a Python counterpart in the tables above.

R-parity

tests/test_r_parity.py (auto-skipped if OlinkAnalyze isn't installed in the CMAP R env) compares against OlinkAnalyze 3.8.2:

Quantity Result
olink_ttest estimate (mean diff) atol=1e-8
olink_ttest statistic / p.value Pearson r > 0.99
olink_wilcox statistic / p.value `
olink_lmer F-vs-t² / p.value Pearson r > 0.95
olink_anova F-statistic / p.value Pearson r = 1.0000 (50 proteins)
olink_one_non_parametric Kruskal stat / p.value Pearson r = 1.0000 (50 proteins)
olink_bridge_selector selected sample set 100 % overlap with R
olink_lod below-LOD flags > 95 % agreement

Benchmark

200 proteins × 32 samples, 2 groups:

python examples/benchmark.py --runs 2

Typical Python pipeline wall-time:

Function Python (ms)
olink_ttest ~400
olink_wilcox ~255

(LMM is dominated by statsmodels' per-protein fit — call out n_jobs parallelism in v0.2.)

Notes on the algorithm match

  • t-test: Welch unequal-variance with the Satterthwaite DF formula. scipy.stats.ttest_ind(equal_var=False) matches R t.test(var.equal=FALSE) exactly.
  • Wilcoxon: Asymptotic Mann-Whitney U with Yates continuity correction (scipy.stats.mannwhitneyu(use_continuity=True, method='asymptotic')) matches R wilcox.test(exact=FALSE, correct=TRUE). Note R reports W = U_{g1} while scipy reports U_1 for the first sample — Pearson r is essentially ±1 depending on group ordering.
  • LMM: statsmodels.mixedlm fits ML by default (set reml=False to match lme4::lmer(REML=FALSE)). For REML, pass reml=True to the underlying model — fixed-effect coefficients agree at ~1e-5.
  • BH adjustment: false_discovery_control(method='bh') matches stats::p.adjust(method='BH') exactly.

Reproducing R results exactly

# Requires OlinkAnalyze in the CMAP R env
pytest tests/test_r_parity.py -v

Relationship to omicverse

Developed upstream in omicverse:

  • Canonical implementation: omicverse.protein.tl.de(adata, method='ttest', platform='olink')
  • Standalone mirror (this repo): same code, same API, minus the omicverse packaging.

Citation

If you use this package, please cite the upstream OlinkAnalyze package:

Olink Proteomics AB. OlinkAnalyze: Facilitate Analysis of Proteomic Data from Olink. R package version 5.0.0. https://cran.r-project.org/package=OlinkAnalyze

…and acknowledge omicverse / this repo for the Python port.

License

AGPL-3.0 — matches the upstream CRAN package.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyolinkanalyze-0.2.1.tar.gz (47.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyolinkanalyze-0.2.1-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file pyolinkanalyze-0.2.1.tar.gz.

File metadata

  • Download URL: pyolinkanalyze-0.2.1.tar.gz
  • Upload date:
  • Size: 47.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pyolinkanalyze-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f5340175240351495c4c96ab74d61f8bb84e7bf18fe7ac0880503f1bd36a4258
MD5 ebaecec0b3e2ea87fb69857cb7182a3e
BLAKE2b-256 b136c381745458bad0dba9f66a561cd9aa3a9ae5d623700fa75f8a5dc7fd97bc

See more details on using hashes here.

File details

Details for the file pyolinkanalyze-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pyolinkanalyze-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 39.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pyolinkanalyze-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f9a357f4e4fbbe706bf80f0da86a6e2bc5aa8b67df9104e4321242d0c651e58f
MD5 12ce0b502c7af56092fc2c0f3e472dbd
BLAKE2b-256 256d4318e47a1b118a0219b43b167c857f9097215d8619d375a5d1775ba59b59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page