Pure-Python port of CRAN MatrixEQTL -- ultra-fast eQTL mapping (linear / ANOVA / interaction models, cis/trans split, Benjamini-Hochberg FDR).
Project description
py-matrixeqtl
Pure-Python port of the R/CRAN package MatrixEQTL — Andrey A. Shabalin's ultra-fast eQTL mapping engine (Shabalin, Bioinformatics 2012, 28(10):1353–1358).
pymatrixeqtl is a standalone, dependency-light (numpy / scipy / pandas
only) reimplementation of MatrixEQTL's complete computational core. It
reproduces the upstream results bit-for-bit — there is no R involved.
| PyPI / import name | pymatrixeqtl |
| License | LGPL-3 (same as upstream MatrixEQTL) |
| Upstream | CRAN MatrixEQTL 2.3 |
| Dependencies | numpy, scipy, pandas |
What it does
MatrixEQTL tests every SNP–gene pair for association extremely fast. The trick — faithfully replicated here — is to residualise the expression and genotype matrices on the covariates once, scale each row to unit norm, and then obtain every test statistic as a single entry of one big matrix product (a correlation). The t / F statistic and p-value follow analytically.
- Three models
modelLINEAR— additive linear modelexpr ~ SNP + covariates(reports slopebeta, t-statistic, p-value).modelANOVA— genotype as a categorical factor (F-test).modelLINEAR_CROSS— linear model with a SNP × last-covariate interaction term.
- cis / trans split — with SNP- and gene-position tables, associations
are split into cis (within
cisDistbp, default 1e6) and trans, each with its own p-value threshold. - Benjamini-Hochberg FDR — computed exactly as MatrixEQTL does, over all tested pairs (not only those passing the threshold).
- Error-covariance whitening, the
min.pv.by.genesnptables and p-value histograms / Q-Q profiles — all ported faithfully.
Install
pip install pymatrixeqtl # once published
# or, from a checkout:
pip install -e .
Quick start
import pymatrixeqtl as me
# MatrixEQTL ships an example dataset; it is bundled inside this package.
ex = me.load_example()
# cis / trans eQTL analysis, additive linear model
res = me.Matrix_eQTL_main(
snps=ex["snps"], gene=ex["gene"], cvrt=ex["cvrt"],
pvOutputThreshold=1e-2, # trans threshold
pvOutputThreshold_cis=1e-1, # cis threshold
snpspos=ex["snpspos"], genepos=ex["genepos"],
useModel=me.modelLINEAR, cisDist=1e6,
)
res.cis # tidy DataFrame: snps, gene, beta, statistic, pvalue, FDR
res.trans # tidy DataFrame
print(res.summary())
Or use the one-call high-level wrapper, which accepts file paths, NumPy arrays or pandas DataFrames:
res = me.eqtl(
"SNP.txt", "GE.txt", "Covariates.txt",
model=me.modelANOVA, pv_threshold=1e-5,
snpspos="snpsloc.txt", genepos="geneloc.txt",
pv_threshold_cis=1e-3, cis_dist=1e6,
)
The SlicedData container
MatrixEQTL's chunked matrix container is ported as
pymatrixeqtl.SlicedData, a NumPy-backed class with the row-slice
chunking preserved for big-matrix memory parity.
| R method | Python method |
|---|---|
$new(mat) / CreateFromMatrix |
SlicedData(mat) / create_from_matrix |
LoadFile |
load_file |
nRows() / nCols() / nSlices() |
n_rows() / n_cols() / n_slices() |
RowStandardizeCentered |
row_standardize_centered |
ColumnSubsample |
column_subsample |
RowReorder |
row_reorder |
ResliceCombined |
reslice_combined |
CombineInOneSlice |
combine_in_one_slice |
FindRow |
find_row |
SetNanRowMean |
set_nan_row_mean |
The original R names are also available as aliases (SlicedData.LoadFile,
SlicedData.nRows, …).
Public API
SlicedData, Matrix_eQTL_main, Matrix_eQTL_engine, MatrixEQTLResult,
modelLINEAR, modelANOVA, modelLINEAR_CROSS, MODEL_NAMES, eqtl,
load_example, EXAMPLE_DIR, plot_matrix_eqtl.
Accuracy vs R
pymatrixeqtl is verified against CRAN MatrixEQTL 2.3 on MatrixEQTL's own
bundled example dataset for all three models, both cis and trans:
| quantity | Pearson r vs R | max |diff| |
|---|---|---|
| beta (slope) | 1.0000000 | < 2e-15 |
| t / F statistic | 1.0000000 | < 1.1e-11 |
| p-value | 1.0000000 | < 2e-14 |
| FDR | 1.0000000 | < 3e-14 |
cis / trans split counts match exactly. Differences are pure floating-point round-off — this is the same linear algebra.
Run the parity tests yourself:
pytest tests/ -q
tests/test_r_parity.py drives R MatrixEQTL via the bundled
tests/r_reference_driver.R and skips gracefully if R is unavailable;
tests/test_smoke.py is pure Python.
Benchmark
python examples/benchmark.py --big
On a 5000 SNP × 2000 gene × 200 sample synthetic dataset the Python engine tests all 10,000,000 pairs in ~0.3 s.
See examples/compare_R_vs_Python.ipynb for a full executed R-vs-Python
timing / accuracy comparison with plots.
Citation
If you use this software, please cite the original MatrixEQTL paper:
Shabalin, A. A. (2012). Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28(10), 1353–1358.
License
LGPL-3, matching upstream MatrixEQTL. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymatrixeqtl-0.1.0.tar.gz.
File metadata
- Download URL: pymatrixeqtl-0.1.0.tar.gz
- Upload date:
- Size: 74.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ae05860983e72ea4db062ef3db7e1945f9d18f6bc9953e6325b708eb120d42a
|
|
| MD5 |
a7b8cd50586a1d2c4b005800e00bec2f
|
|
| BLAKE2b-256 |
1416d534d56dd7f5281bc9d6ccbcec126d299ff5ee8a32d08bc234bb8af9c621
|
Provenance
The following attestation bundles were made for pymatrixeqtl-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-matrixeqtl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pymatrixeqtl-0.1.0.tar.gz -
Subject digest:
2ae05860983e72ea4db062ef3db7e1945f9d18f6bc9953e6325b708eb120d42a - Sigstore transparency entry: 1582436516
- Sigstore integration time:
-
Permalink:
omicverse/py-matrixeqtl@25000a0d6eed0310d69594e7d9a73dedde477306 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25000a0d6eed0310d69594e7d9a73dedde477306 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pymatrixeqtl-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pymatrixeqtl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 57.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ad666139d29824b578f9e56459c50c4cf5440015fa701ef42d5ccd797fcea4e
|
|
| MD5 |
f2450b18992da6378699d95b561811da
|
|
| BLAKE2b-256 |
6f73e569f33d9d2f78c40b0decae358b82c68241487e30990c3b9c07ec0f9895
|
Provenance
The following attestation bundles were made for pymatrixeqtl-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-matrixeqtl
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pymatrixeqtl-0.1.0-py3-none-any.whl -
Subject digest:
8ad666139d29824b578f9e56459c50c4cf5440015fa701ef42d5ccd797fcea4e - Sigstore transparency entry: 1582436591
- Sigstore integration time:
-
Permalink:
omicverse/py-matrixeqtl@25000a0d6eed0310d69594e7d9a73dedde477306 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@25000a0d6eed0310d69594e7d9a73dedde477306 -
Trigger Event:
workflow_dispatch
-
Statement type: