Skip to main content

Pure-Python port of CRAN MatrixEQTL -- ultra-fast eQTL mapping (linear / ANOVA / interaction models, cis/trans split, Benjamini-Hochberg FDR).

Project description

py-matrixeqtl

Pure-Python port of the R/CRAN package MatrixEQTL — Andrey A. Shabalin's ultra-fast eQTL mapping engine (Shabalin, Bioinformatics 2012, 28(10):1353–1358).

pymatrixeqtl is a standalone, dependency-light (numpy / scipy / pandas only) reimplementation of MatrixEQTL's complete computational core. It reproduces the upstream results bit-for-bit — there is no R involved.

PyPI / import name pymatrixeqtl
License LGPL-3 (same as upstream MatrixEQTL)
Upstream CRAN MatrixEQTL 2.3
Dependencies numpy, scipy, pandas

What it does

MatrixEQTL tests every SNP–gene pair for association extremely fast. The trick — faithfully replicated here — is to residualise the expression and genotype matrices on the covariates once, scale each row to unit norm, and then obtain every test statistic as a single entry of one big matrix product (a correlation). The t / F statistic and p-value follow analytically.

  • Three models
    • modelLINEAR — additive linear model expr ~ SNP + covariates (reports slope beta, t-statistic, p-value).
    • modelANOVA — genotype as a categorical factor (F-test).
    • modelLINEAR_CROSS — linear model with a SNP × last-covariate interaction term.
  • cis / trans split — with SNP- and gene-position tables, associations are split into cis (within cisDist bp, default 1e6) and trans, each with its own p-value threshold.
  • Benjamini-Hochberg FDR — computed exactly as MatrixEQTL does, over all tested pairs (not only those passing the threshold).
  • Error-covariance whitening, the min.pv.by.genesnp tables and p-value histograms / Q-Q profiles — all ported faithfully.

Install

pip install pymatrixeqtl            # once published
# or, from a checkout:
pip install -e .

Quick start

import pymatrixeqtl as me

# MatrixEQTL ships an example dataset; it is bundled inside this package.
ex = me.load_example()

# cis / trans eQTL analysis, additive linear model
res = me.Matrix_eQTL_main(
    snps=ex["snps"], gene=ex["gene"], cvrt=ex["cvrt"],
    pvOutputThreshold=1e-2,            # trans threshold
    pvOutputThreshold_cis=1e-1,        # cis threshold
    snpspos=ex["snpspos"], genepos=ex["genepos"],
    useModel=me.modelLINEAR, cisDist=1e6,
)

res.cis     # tidy DataFrame: snps, gene, beta, statistic, pvalue, FDR
res.trans   # tidy DataFrame
print(res.summary())

Or use the one-call high-level wrapper, which accepts file paths, NumPy arrays or pandas DataFrames:

res = me.eqtl(
    "SNP.txt", "GE.txt", "Covariates.txt",
    model=me.modelANOVA, pv_threshold=1e-5,
    snpspos="snpsloc.txt", genepos="geneloc.txt",
    pv_threshold_cis=1e-3, cis_dist=1e6,
)

The SlicedData container

MatrixEQTL's chunked matrix container is ported as pymatrixeqtl.SlicedData, a NumPy-backed class with the row-slice chunking preserved for big-matrix memory parity.

R method Python method
$new(mat) / CreateFromMatrix SlicedData(mat) / create_from_matrix
LoadFile load_file
nRows() / nCols() / nSlices() n_rows() / n_cols() / n_slices()
RowStandardizeCentered row_standardize_centered
ColumnSubsample column_subsample
RowReorder row_reorder
ResliceCombined reslice_combined
CombineInOneSlice combine_in_one_slice
FindRow find_row
SetNanRowMean set_nan_row_mean

The original R names are also available as aliases (SlicedData.LoadFile, SlicedData.nRows, …).

Public API

SlicedData, Matrix_eQTL_main, Matrix_eQTL_engine, MatrixEQTLResult, modelLINEAR, modelANOVA, modelLINEAR_CROSS, MODEL_NAMES, eqtl, load_example, EXAMPLE_DIR, plot_matrix_eqtl.

Accuracy vs R

pymatrixeqtl is verified against CRAN MatrixEQTL 2.3 on MatrixEQTL's own bundled example dataset for all three models, both cis and trans:

quantity Pearson r vs R max |diff|
beta (slope) 1.0000000 < 2e-15
t / F statistic 1.0000000 < 1.1e-11
p-value 1.0000000 < 2e-14
FDR 1.0000000 < 3e-14

cis / trans split counts match exactly. Differences are pure floating-point round-off — this is the same linear algebra.

Run the parity tests yourself:

pytest tests/ -q

tests/test_r_parity.py drives R MatrixEQTL via the bundled tests/r_reference_driver.R and skips gracefully if R is unavailable; tests/test_smoke.py is pure Python.

Benchmark

python examples/benchmark.py --big

On a 5000 SNP × 2000 gene × 200 sample synthetic dataset the Python engine tests all 10,000,000 pairs in ~0.3 s.

See examples/compare_R_vs_Python.ipynb for a full executed R-vs-Python timing / accuracy comparison with plots.

Citation

If you use this software, please cite the original MatrixEQTL paper:

Shabalin, A. A. (2012). Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28(10), 1353–1358.

License

LGPL-3, matching upstream MatrixEQTL. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymatrixeqtl-0.1.0.tar.gz (74.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymatrixeqtl-0.1.0-py3-none-any.whl (57.2 kB view details)

Uploaded Python 3

File details

Details for the file pymatrixeqtl-0.1.0.tar.gz.

File metadata

  • Download URL: pymatrixeqtl-0.1.0.tar.gz
  • Upload date:
  • Size: 74.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymatrixeqtl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2ae05860983e72ea4db062ef3db7e1945f9d18f6bc9953e6325b708eb120d42a
MD5 a7b8cd50586a1d2c4b005800e00bec2f
BLAKE2b-256 1416d534d56dd7f5281bc9d6ccbcec126d299ff5ee8a32d08bc234bb8af9c621

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymatrixeqtl-0.1.0.tar.gz:

Publisher: publish.yml on omicverse/py-matrixeqtl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymatrixeqtl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pymatrixeqtl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 57.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymatrixeqtl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ad666139d29824b578f9e56459c50c4cf5440015fa701ef42d5ccd797fcea4e
MD5 f2450b18992da6378699d95b561811da
BLAKE2b-256 6f73e569f33d9d2f78c40b0decae358b82c68241487e30990c3b9c07ec0f9895

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymatrixeqtl-0.1.0-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-matrixeqtl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page