Pure-Python port of Bioconductor imputeLCMD — left-censored MNAR + MAR (MLE/KNN/SVD) imputation, model selection, and synthetic data for label-free proteomics.
Project description
pyimputelcmd
A pure-Python port of Bioconductor imputeLCMD (Lazar et al., J Proteome Res 2016) for left-censored missing-value imputation in label-free LC-MS/MS proteomics data.
- Full imputeLCMD API — all imputers (
MinDet,MinProb,QRILC,ZERO,MLE,KNN,SVD,MAR,MAR.MNAR), themodel.SelectorMCAR/MNAR classifier, synthetic-data and roll-up helpers - No
rpy2, no R install — everything in NumPy / SciPy / pandas / scikit-learn - Bit-for-bit reproduction of the R reference for the deterministic
MinDet/ZERO; distribution-level (KS) parity for the stochasticMinProb/QRILC; high Pearson-correlation parity forKNN/SVD/MLE(R-parity tests intests/test_r_parity.py) - AnnData-friendly: accepts
np.ndarrayorpd.DataFrame(rows = proteins, columns = samples; preserves index/columns) - A single
impute(X, method=…)dispatcher for the omicverse wrapper
This is a standalone mirror of the canonical implementation that lives in
omicverse(omicverse.protein.pp.impute). All algorithmic work is developed upstream in omicverse and synced here for users who want the imputers without the full omicverse stack.
Install
pip install pyimputelcmd
Quick start
import numpy as np
from pyimputelcmd import impute, impute_mindet, impute_minprob, impute_qrilc
rng = np.random.default_rng(0)
X = rng.normal(20.0, 1.0, (500, 6)) # 500 proteins × 6 samples
X[X < 19.0] = np.nan # left-censored MNAR (~16% missing)
# Three R-parity imputers — all accept the same (X, …) signature
out_md = impute_mindet(X) # 1st-percentile floor (q=0.01)
out_mp = impute_minprob(X, seed=0) # Gaussian below the floor
out_qr = impute_qrilc(X, seed=0) # truncated normal, QR-fit mu/sigma
# Single dispatcher (preferred for omicverse / config-driven workflows)
out = impute(X, method='qrilc', tune_sigma=1.0, seed=0)
Functional API (mirrors R one-to-one)
Imputers
| Python | R counterpart | Notes |
|---|---|---|
impute_mindet(X, q=0.01) |
impute.MinDet |
Deterministic — bit-exact match |
impute_minprob(X, q=0.01, tune_sigma=1.0, seed=None) |
impute.MinProb |
Stochastic; KS-equivalent to R |
impute_qrilc(X, tune_sigma=1.0, seed=None, upper_q=0.99) |
impute.QRILC |
Stochastic; OLS-fit (μ, σ) match R lm() exactly |
impute_zero(X) |
impute.ZERO |
Deterministic — bit-exact match |
impute_mle(X, max_iter=200, tol=1e-4, seed=None, sample=True) |
impute.wrapper.MLE |
MVN-EM + I-step draw (norm::imp.norm); Pearson r ≈ 0.98 vs R |
impute_knn(X, K=10) |
impute.wrapper.KNN |
Per-protein KNN (sklearn.KNNImputer); Pearson r > 0.99 vs R |
impute_svd(X, K=2) |
impute.wrapper.SVD |
Iterative rank-K SVD (Stacklies 2007); Pearson r > 0.99 vs R |
impute_mar(X, mcar_mask, method='mle') |
impute.MAR |
Apply a MAR imputer to MCAR-flagged rows |
impute_mar_mnar(X, mcar_mask, method_mar='mle', method_mnar='qrilc') |
impute.MAR.MNAR |
Combined MAR + MNAR pipeline |
impute(X, method=…) |
— | Dispatcher used by omicverse.protein.pp.impute |
Model selection & utilities
| Python | R counterpart | Notes |
|---|---|---|
model_selector(X) → (is_mar, censoring_thr) |
model.Selector |
MCAR/MNAR classifier; 100% flag agreement with R |
insert_mvs(X, n_mv=200, mode='MCAR', …) |
insertMVs |
Inject synthetic MVs for benchmarking |
generate_expression_data(n_features, n_samples1, n_samples2, …) |
generate.ExpressionData |
Synthetic two-condition data |
pep2prot(peptide_data, rollup_map, method='median') |
pep2prot |
Peptide → protein roll-up |
generate_rollup_map(mapping) |
generate.RollUpMap |
Build a peptide → protein roll-up table |
Matrix orientation
The R imputeLCMD package uses rows = proteins / peptides, columns = samples, and so does this port. AnnData users should transpose first:
import anndata as ad
adata = ad.read_h5ad("proteins.h5ad") # cells × proteins (AnnData layout)
X = adata.X.T # proteins × samples
imputed = impute(X, method='qrilc')
adata.X = imputed.T
Reproducing the R reference exactly
tests/r_reference_driver.R invokes the original R imputeLCMD functions on the same input matrix dumped by the Python side. tests/test_r_parity.py then checks:
- MinDet / ZERO —
np.allclose(py, R, atol=1e-12)(bit-exact deterministic) - MinProb — KS test per column on the imputed marginal (p > 0.01)
- QRILC — closed-form OLS intercept/slope agree with R
lm()to 1e-6, and the truncated-normal draws pass a KS test against Rrtmvnorm(Gibbs) - KNN / SVD / MLE — Pearson correlation against R on a realistic correlated (low-rank + noise) matrix: KNN r > 0.99, SVD r > 0.99, MLE r ≈ 0.98
- model.Selector — per-protein MCAR/MNAR flags agree with R (100% on the bimodal fixture)
# Run the R-parity tests (needs the CMAP env or env vars)
PYIMPUTELCMD_RSCRIPT=/path/to/Rscript pytest tests/test_r_parity.py -v
Coverage of the R imputeLCMD API
100% function coverage — all 14 functions exported by Bioconductor imputeLCMD are ported.
| R function | Status |
|---|---|
impute.MinDet, impute.MinProb, impute.QRILC |
✅ v0.1 |
impute.ZERO |
✅ v0.1.1 |
impute.wrapper.MLE / impute.wrapper.KNN / impute.wrapper.SVD |
✅ v0.1.1 |
impute.MAR / impute.MAR.MNAR |
✅ v0.1.1 |
model.Selector (MCAR/MNAR classifier) |
✅ v0.1.1 |
insertMVs, generate.ExpressionData |
✅ v0.1.1 |
pep2prot, generate.RollUpMap |
✅ v0.1.1 |
Relationship to omicverse
Developed upstream in omicverse:
- Canonical implementation:
omicverse.protein.pp.impute - Standalone mirror (this repo): same code, same API, minus the omicverse packaging
Citation
If you use this package, please cite the original imputeLCMD paper:
Lazar, C., Gatto, L., Ferro, M., Bruley, C., Burger, T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare the Performance of Normalization Strategies. J Proteome Res 15, 1116–1125 (2016). DOI: 10.1021/acs.jproteome.5b00981
and acknowledge omicverse / this repo for the Python port.
License
GPL-3 — matches the upstream Bioconductor package.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyimputelcmd-0.1.1.tar.gz.
File metadata
- Download URL: pyimputelcmd-0.1.1.tar.gz
- Upload date:
- Size: 30.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d545fbe9055bb638dc98fb3304a477300673e7729281936127342f7db9b1286a
|
|
| MD5 |
ce8d2f54f59c2a92625c6eea4f38c0af
|
|
| BLAKE2b-256 |
9b130826c3e7454afca45a70686f68983b803fe4322a16b096c8b59b75237727
|
File details
Details for the file pyimputelcmd-0.1.1-py3-none-any.whl.
File metadata
- Download URL: pyimputelcmd-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0f31862766bb3d2c8e91bd58316134dafc3cb5835e762459ced5e8521def6e0d
|
|
| MD5 |
3d7d99ecda00653c3e07d31447286219
|
|
| BLAKE2b-256 |
b1088771d22de9ca0b3a42f6e84849f254b12946db407920c5ede25813cb85dd
|