Pure-Python port of Bioconductor DEqMS — peptide-count-aware moderated t-tests for differential mass-spec proteomics.

These details have not been verified by PyPI

Project links

Project description

pydeqms

A pure-Python port of Bioconductor DEqMS (Zhu et al., MCP 2020) — peptide-count-aware moderated t-tests for differential mass-spec proteomics.

Three-step pipeline mirroring R: lmFit → eBayes → spectraCounteBayes → outputResult
No rpy2, no R install — limma lmFit, Smyth 2004 eBayes, and the DEqMS count-prior reimplemented directly in NumPy / SciPy
Bit-for-bit reproduction of limma lmFit (coefs / sigma) and eBayes (df.prior / s2.prior / t / p); DEqMS sca.t / sca.P.Value match R within numerical loess slack — see tests/test_r_parity.py
4.8× faster than R limma + DEqMS on the canonical 5000 × 12 benchmark (Pearson r = 1.0 on sca.t / sca.P; 100% top-500 DE gene agreement)
AnnData-friendly: accepts np.ndarray or pd.DataFrame (rows = proteins, columns = samples)

This is a standalone mirror of the canonical implementation that lives in omicverse (omicverse.protein.tl.de(method='deqms')). All algorithmic work is developed upstream in omicverse and synced here.

Install

pip install pydeqms

pydeqms depends on scikit-misc for the R-faithful loess; this is a pip-installable C/Cython package.

Quick-start (high-level pipeline)

import numpy as np
import pandas as pd
from pydeqms import deqms

# M: log2 protein intensity matrix (proteins × samples, NaN = missing)
M = pd.read_csv("proteinGroups_log2.tsv", sep="\t", index_col=0)

# design: samples × groups (e.g. two-group treatment)
design = pd.DataFrame({
    "control":   [1, 1, 1, 0, 0, 0],
    "treatment": [0, 0, 0, 1, 1, 1],
}, index=M.columns)

# count: per-protein peptide / PSM count (one column per protein)
count = pd.read_csv("peptide_counts.tsv", sep="\t", index_col=0)["count"]

result = deqms(
    M, design, count=count,
    contrast=np.array([-1.0, 1.0]),       # treatment - control
    fit_method="loess",                    # or 'nls' or 'spline'
)
result.head()
# gene        logFC  AveExpr     t  P.Value  adj.P.Val  count  sca.t  sca.P.Value  sca.adj.pval
# prot_0042   2.13    18.91    ...    ...      ...      14     5.86   3.2e-7       2.1e-4
# ...

Low-level API (mirrors R one-to-one)

from pydeqms import lm_fit, ebayes, spectra_count_ebayes, output_result

fit = lm_fit(M, design)                                   # limma::lmFit
ebayes(fit)                                                # limma::eBayes
spectra_count_ebayes(fit, count=count, fit_method="loess") # DEqMS::spectraCounteBayes
result = output_result(fit, coef_col=0)                    # DEqMS::outputResult

The LinearModelFit container exposes every field the R MArrayLM object does — coefficients, sigma, df_residual, stdev_unscaled, Amean, t, p_value, s2_post, df_prior, s2_prior, sca_t, sca_p, sca_postvar, sca_priorvar, sca_dfprior, fit_method, loess_model.

R-parity status

Step	R counterpart	Match
`lm_fit` coefficients	`limma::lmFit`	bit-exact (`atol=1e-9`)
`lm_fit` sigma / df / stdev_unscaled	`limma::lmFit`	bit-exact
`ebayes` `df_prior`, `s2_prior`	`limma::eBayes` (Smyth 2004 trigamma-MLE)	1e-3 / 1e-6
`ebayes` `t`, `p_value`	`limma::eBayes`	`atol=1e-4`
`spectra_count_ebayes` `sca_priorvar`	`DEqMS::spectraCounteBayes`	1% rtol (loess)
`spectra_count_ebayes` `sca_t`, `sca_p`	`DEqMS::spectraCounteBayes`	5% rtol / Pearson r = 1.0
Top-N DE proteins	`DEqMS::outputResult`	100% agreement

The 5% slack on sca.t is due to small differences between R stats::loess and skmisc.loess — both wrap the same Cleveland 1979 C backend but apply slightly different edge weights. The ranking and downstream calls are unaffected (Pearson r = 1.0, top-500 gene-set 100% identical).

Benchmark

5000 proteins × 12 samples, 10% planted DE, loess fit method (R default):

Step	R (ms)	Python (ms)	Speedup
`lmFit`	9	(vectorised in NumPy)	–
`eBayes`	14	(vectorised in NumPy)	–
`spectraCounteBayes`	132	(loess in `skmisc`)	–
Pipeline total	155	32.4	4.78×
Including `Rscript` startup	2861	32.4	88×

Run it yourself:

python examples/benchmark.py --runs 3 --n-proteins 5000 --n-per-group 6

Reproducing R results exactly

tests/r_reference_driver.R runs the original limma + DEqMS pipeline on the same TSV input the Python side dumps. tests/test_r_parity.py (skipped if no CMAP / DEqMS env) checks every numerical field — see the R-parity table above.

# Run the R-parity tests
pytest tests/test_r_parity.py -v

R function coverage

Every public function in DEqMS 1.24.0 has a Python counterpart.

R function	Python	Status
`lmFit` (no-weight, no-correlation)	`pydeqms.lm_fit`	✅ bit-exact
`eBayes` (Smyth 2004 trigamma-MLE)	`pydeqms.ebayes`	✅ bit-close
`spectraCounteBayes` (loess / nls / spline)	`pydeqms.spectra_count_ebayes`	✅ ≤5% sca.t slack
`outputResult`	`pydeqms.output_result`	✅
`contrasts.fit` (vector / matrix contrasts)	via the `contrast=` kwarg	✅
`equalMedianNormalization`	`pydeqms.equal_median_normalization`	✅ bit-exact
`medianSummary`	`pydeqms.median_summary`	✅ bit-exact
`medianSweeping`	`pydeqms.median_sweeping`	✅ bit-exact
`medpolishSummary`	`pydeqms.medpolish_summary`	✅ Pearson r > 0.999
`peptideProfilePlot`	`pydeqms.peptide_profile_plot`	✅ matplotlib
`Residualplot`	`pydeqms.residual_plot`	✅ matplotlib
`VarianceBoxplot`	`pydeqms.variance_boxplot`	✅ matplotlib
`VarianceScatterplot`	`pydeqms.variance_scatter_plot`	✅ matplotlib

Future work

Feature	Status
`peptideLevelTest` (PSM-level moderated t)	⏳ planned
Weighted least squares (`lmFit(M, design, weights=…)`)	⏳ planned
Random effects (correlation)	⏳ planned

Relationship to omicverse

Developed upstream in omicverse:

Canonical implementation: omicverse.protein.tl.de(adata, method='deqms')
Standalone mirror (this repo): same code, same API, minus the omicverse packaging

Citation

If you use this package, please cite both the DEqMS paper and Smyth 2004 (for the underlying empirical-Bayes framework):

Zhu Y., Orre L.M., Tran Y.Z., Mermelekas G., Johansson H.J., Malyutina A., Anders S., Lehtiö J. DEqMS: A method for accurate variance estimation in differential protein expression analysis. Mol Cell Proteomics 19, 1047–1057 (2020). DOI: 10.1074/mcp.TIR119.001646

Smyth, G. K. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology 3, Article 3 (2004).

…and acknowledge omicverse / this repo for the Python port.

License

LGPL-3 — matches the upstream Bioconductor package.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydeqms-0.1.1.tar.gz (28.5 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydeqms-0.1.1-py3-none-any.whl (22.6 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file pydeqms-0.1.1.tar.gz.

File metadata

Download URL: pydeqms-0.1.1.tar.gz
Upload date: May 20, 2026
Size: 28.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pydeqms-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`1a59351cb60c79b111d65ff49c06c8bcbde85c944e25a2ff4bc725981c7ded84`
MD5	`52a115ffa3edd53052dd9d32bafcec29`
BLAKE2b-256	`b166aa02cd60019e9bc79ccf037200df9681bae4a01f20146054eaeecf739b6e`

See more details on using hashes here.

File details

Details for the file pydeqms-0.1.1-py3-none-any.whl.

File metadata

Download URL: pydeqms-0.1.1-py3-none-any.whl
Upload date: May 20, 2026
Size: 22.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pydeqms-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c452a215c959f3092198abc76053d57e7fe3f225eaf61215ad59746071069fe2`
MD5	`fff8ceb3a099c2f33666781923965f21`
BLAKE2b-256	`9f12776e74ec28cd54cd65f712b7eb2fb6fe06d14635fd369e40db11184eda72`

See more details on using hashes here.

pydeqms 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pydeqms

Install

Quick-start (high-level pipeline)

Low-level API (mirrors R one-to-one)

R-parity status

Benchmark

Reproducing R results exactly

R function coverage

Future work

Relationship to omicverse

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes