Pure-Python port of Olink Proteomics' R OlinkAnalyze — NPX I/O, bridge normalization, and per-protein differential expression for Olink proteomics.
Project description
pyolinkanalyze
A pure-Python port of R OlinkAnalyze (Olink Proteomics AB) — 100 % coverage of the OlinkAnalyze 3.8.2 public API: NPX I/O (CSV / TSV / Excel), bridge / subset / N-way normalization, per-protein differential expression (t-test, Wilcoxon, LMM, ANOVA, Kruskal-Wallis / Friedman, ordinal regression, plus post-hoc contrasts), limit-of-detection handling, plate randomization, plate-layout / distribution plots, pathway enrichment, and a full set of matplotlib plots.
- No
rpy2, no R install. Welch t-test viascipy.stats.ttest_ind(equal_var=False), Mann-Whitney viascipy.stats.mannwhitneyu(use_continuity=True), LMM viastatsmodels.regression.mixed_linear_model.MixedLM, type-III ANOVA viastatsmodels+ sum-to-zero contrasts, ordinal regression viastatsmodels.miscmodels.ordinal_model.OrderedModel. - Tidy long-format
pandas.DataFrameinterface — the same NPX schema Olink ships in their Explore / Target CSVs. - R-parity tests against
OlinkAnalyze3.8.2 — Pearson r > 0.99 (often=1.0) on per-protein test statistics and p-values for t-test, Wilcoxon, LMM, ANOVA and Kruskal-Wallis.
This is a standalone mirror of the canonical implementation that lives in
omicverse. All algorithmic work is developed upstream in omicverse and synced here.
Install
pip install pyolinkanalyze
Dependencies: numpy, scipy, pandas, statsmodels. Plotting needs matplotlib + scikit-learn (pip install pyolinkanalyze[plotting]); olink_umap_plot optionally uses umap-learn (pip install pyolinkanalyze[umap]) and falls back to PCA otherwise.
Quick-start
import pyolinkanalyze as pa
# Load Olink long-format NPX CSV (auto-detects ; vs , separators)
npx = pa.read_npx_csv("study_NPX_2024.csv")
# Differential expression: two-group Welch t-test per protein
res = pa.olink_ttest(npx, variable="Treatment")
res.head()
# OlinkID Assay UniProt term estimate statistic p.value Adjusted_pval
# OID00012 IL6 P05231 group1 - group0 1.84 5.12 1.2e-5 8.6e-4
# ...
# Non-parametric alternative
res_w = pa.olink_wilcox(npx, variable="Treatment")
# Linear mixed-effects: NPX ~ Treatment + (1|Subject), per protein
res_lmm = pa.olink_lmer(npx, variable="Treatment", random="Subject")
# Bridge normalization across two batches (4 overlapping samples)
df_ref = pa.read_npx_csv("batch_A.csv")
df_target = pa.read_npx_csv("batch_B.csv")
joined = pa.olink_normalization(
df_ref, df_target,
overlapping_samples_df1=["B01", "B02", "B03", "B04"],
overlapping_samples_df2=["B01", "B02", "B03", "B04"],
)
More tests (v0.2):
# Multi-group ANOVA + Tukey post-hoc
res_av = pa.olink_anova(npx, variable="Group")
res_ph = pa.olink_anova_posthoc(npx, variable="Group", effect="Group")
# Non-parametric (Kruskal-Wallis) + Dunn post-hoc
res_kw = pa.olink_one_non_parametric(npx, variable="Group")
res_dunn = pa.olink_one_non_parametric_posthoc(npx, variable="Group")
# Ordinal regression
res_ord = pa.olink_ordinal_regression(npx, variable="Group")
# Limit of detection (negative-control estimate) + below-LOD flags
npx_lod = pa.olink_lod(npx, lod_method="NCLOD")
# Pick optimal bridging samples
bridges = pa.olink_bridge_selector(npx, sample_missing_freq=0.1, n=8)
# Randomize a sample manifest across plates
plated = pa.olink_plate_randomizer(manifest, subject_col="Subject", seed=0)
# Pathway enrichment on a DE result
gene_sets = pa.read_gmt("hallmark.gmt")
enr = pa.olink_pathway_enrichment(res, gene_sets, method="gsea")
Plotting helpers:
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
pa.olink_volcano_plot(res, ax=axes[0])
pa.olink_qc_plot(npx, ax=axes[1])
# v0.2 plots
pa.olink_pca_plot(npx, color_by="Treatment")
pa.olink_heatmap_plot(npx)
pa.olink_boxplot(npx, "Treatment", olinkids=["OID00012"])
pa.olink_pathway_heatmap(enr)
# v0.2.1 — plate QC plots + general NPX reader
plated = pa.olink_plate_randomizer(manifest, seed=0)
pa.olink_display_plate_distributions(plated, fill_color="Treatment")
pa.olink_display_plate_layout(plated, color_by="Treatment")
npx = pa.read_npx("study_NPX_2024.xlsx") # dispatches CSV / TSV / Excel
API coverage (v0.2.1)
100 % of the R OlinkAnalyze 3.8.2 public API is ported. The only
names not mapped to Python functions are %>% (the R pipe) and
manifest / npx_data1 / npx_data2 (bundled example datasets) —
these are not functions.
I/O & normalization
| Python | R counterpart |
|---|---|
read_npx |
read_NPX (dispatches CSV / TSV / Excel) ✅ |
read_npx_csv |
read_NPX (long-format CSV path) ✅ |
read_npx_excel |
read_NPX (.xlsx / .xls Olink export) ✅ |
olink_normalization |
olink_normalization (bridge, difference-of-medians) |
olink_normalization_reference_medians |
olink_normalization(reference_medians=…) |
olink_normalization_bridge |
olink_normalization_bridge (paired median-of-diffs) |
olink_normalization_subset |
olink_normalization_subset |
olink_normalization_n |
olink_normalization_n (N-way chain / tree) |
olink_bridge_selector |
olink_bridgeselector |
Statistical tests & post-hoc
| Python | R counterpart |
|---|---|
olink_ttest |
olink_ttest (paired support) |
olink_wilcox |
olink_wilcox |
olink_lmer |
olink_lmer |
olink_lmer_posthoc |
olink_lmer_posthoc (Wald pairwise contrasts) |
olink_anova |
olink_anova (type-III, contr.sum) |
olink_anova_posthoc |
olink_anova_posthoc (Tukey HSD) |
olink_one_non_parametric |
olink_one_non_parametric (Kruskal / Friedman) |
olink_one_non_parametric_posthoc |
olink_one_non_parametric_posthoc (Dunn / paired Wilcoxon) |
olink_ordinal_regression |
olink_ordinalRegression |
olink_ordinal_regression_posthoc |
olink_ordinalRegression_posthoc |
LOD, study design & pathway
| Python | R counterpart |
|---|---|
olink_lod |
olink_lod (NCLOD / FixedLOD) |
olink_plate_randomizer |
olink_plate_randomizer |
olink_pathway_enrichment |
olink_pathway_enrichment (self-contained GSEA / ORA) |
read_gmt |
(helper — load gene sets) |
Plotting (matplotlib)
| Python | R counterpart |
|---|---|
olink_volcano_plot |
olink_volcano_plot |
olink_qc_plot |
olink_qc_plot |
olink_boxplot |
olink_boxplot |
olink_dist_plot |
olink_dist_plot |
olink_pca_plot |
olink_pca_plot (sklearn.decomposition.PCA) |
olink_umap_plot |
olink_umap_plot (umap-learn, PCA fallback) |
olink_heatmap_plot |
olink_heatmap_plot |
olink_lmer_plot |
olink_lmer_plot |
olink_pathway_heatmap |
olink_pathway_heatmap |
olink_pathway_visualization |
olink_pathway_visualization |
olink_display_plate_distributions |
olink_displayPlateDistributions ✅ |
olink_display_plate_layout |
olink_displayPlateLayout ✅ |
olink_pal, set_plot_theme, olink_color_discrete, olink_fill_discrete, olink_color_gradient, olink_fill_gradient |
same names |
Not Python functions
| R name | Reason |
|---|---|
%>% |
R magrittr pipe — a language operator, not a function to port |
manifest, npx_data1, npx_data2 |
bundled example datasets, not functions |
Every other function in R OlinkAnalyze 3.8.2 has a Python counterpart in the tables above.
R-parity
tests/test_r_parity.py (auto-skipped if OlinkAnalyze isn't installed in the CMAP R env) compares against OlinkAnalyze 3.8.2:
| Quantity | Result |
|---|---|
olink_ttest estimate (mean diff) |
atol=1e-8 |
olink_ttest statistic / p.value |
Pearson r > 0.99 |
olink_wilcox statistic / p.value |
` |
olink_lmer F-vs-t² / p.value |
Pearson r > 0.95 |
olink_anova F-statistic / p.value |
Pearson r = 1.0000 (50 proteins) |
olink_one_non_parametric Kruskal stat / p.value |
Pearson r = 1.0000 (50 proteins) |
olink_bridge_selector selected sample set |
100 % overlap with R |
olink_lod below-LOD flags |
> 95 % agreement |
Benchmark
200 proteins × 32 samples, 2 groups:
python examples/benchmark.py --runs 2
Typical Python pipeline wall-time:
| Function | Python (ms) |
|---|---|
olink_ttest |
~400 |
olink_wilcox |
~255 |
(LMM is dominated by statsmodels' per-protein fit — call out n_jobs parallelism in v0.2.)
Notes on the algorithm match
- t-test: Welch unequal-variance with the Satterthwaite DF formula.
scipy.stats.ttest_ind(equal_var=False)matches Rt.test(var.equal=FALSE)exactly. - Wilcoxon: Asymptotic Mann-Whitney U with Yates continuity correction (
scipy.stats.mannwhitneyu(use_continuity=True, method='asymptotic')) matches Rwilcox.test(exact=FALSE, correct=TRUE). Note R reportsW = U_{g1}while scipy reportsU_1for the first sample — Pearson r is essentially±1depending on group ordering. - LMM:
statsmodels.mixedlmfits ML by default (setreml=Falseto matchlme4::lmer(REML=FALSE)). For REML, passreml=Trueto the underlying model — fixed-effect coefficients agree at ~1e-5. - BH adjustment:
false_discovery_control(method='bh')matchesstats::p.adjust(method='BH')exactly.
Reproducing R results exactly
# Requires OlinkAnalyze in the CMAP R env
pytest tests/test_r_parity.py -v
Relationship to omicverse
Developed upstream in omicverse:
- Canonical implementation:
omicverse.protein.tl.de(adata, method='ttest', platform='olink') - Standalone mirror (this repo): same code, same API, minus the omicverse packaging.
Citation
If you use this package, please cite the upstream OlinkAnalyze package:
Olink Proteomics AB. OlinkAnalyze: Facilitate Analysis of Proteomic Data from Olink. R package version 5.0.0. https://cran.r-project.org/package=OlinkAnalyze
…and acknowledge omicverse / this repo for the Python port.
License
AGPL-3.0 — matches the upstream CRAN package.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyolinkanalyze-0.2.1.tar.gz.
File metadata
- Download URL: pyolinkanalyze-0.2.1.tar.gz
- Upload date:
- Size: 47.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5340175240351495c4c96ab74d61f8bb84e7bf18fe7ac0880503f1bd36a4258
|
|
| MD5 |
ebaecec0b3e2ea87fb69857cb7182a3e
|
|
| BLAKE2b-256 |
b136c381745458bad0dba9f66a561cd9aa3a9ae5d623700fa75f8a5dc7fd97bc
|
File details
Details for the file pyolinkanalyze-0.2.1-py3-none-any.whl.
File metadata
- Download URL: pyolinkanalyze-0.2.1-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9a357f4e4fbbe706bf80f0da86a6e2bc5aa8b67df9104e4321242d0c651e58f
|
|
| MD5 |
12ce0b502c7af56092fc2c0f3e472dbd
|
|
| BLAKE2b-256 |
256d4318e47a1b118a0219b43b167c857f9097215d8619d375a5d1775ba59b59
|