Skip to main content

GPU-accelerated ancestry-aware cis-QTL mapping library with tensorQTL-compatible APIs

Project description

localQTL

localQTL is a pure-Python library for local-ancestry-aware xQTL mapping that lets researchers run end-to-end analyses on large cohorts without any R/rpy2 dependencies. It preserves the familiar tensorQTL data model with GPU-first execution paths, flexible genotype loaders, and streaming outputs for large-scale workflows, while adding ancestry-aware use cases.

Features

  • GPU-accelerated workflows powered by PyTorch, with automatic fallbacks to CPU execution when CUDA is unavailable.
  • Modular cis-QTL mapping API exposed through functional helpers such as map_nominal, map_permutations, and map_independent, or via the convenience wrapper CisMapper.
  • Multiple genotype backends including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR), and BED/Parquet inputs, with helpers to stream data in manageable windows.
  • Local ancestry integration by pairing genotype windows with haplotype panels produced by RFMix through RFMixReader and InputGeneratorCisWithHaps.
  • Parquet streaming sinks that make it easy to materialise association statistics without loading the entire result set in memory.
  • Pure-Python statistics (no R/rpy2 required): tensorQTL's rfunc calls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey's qvalue (py-qvalue).
  • Fine-mapping and colocalization helpers via localqtl.susie, localqtl.coloc, and cis post-processing utilities that annotate nominal or permutation outputs with credible sets and COLOC posterior support.

Installation

Install the latest release from PyPI:

pip install localqtl

The project uses Poetry for dependency management. Clone the repository and install the package into a virtual environment:

poetry install

If you prefer pip, you can install the library in editable mode after exporting Poetry's dependency specification:

pip install -e .

Note: GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use versions that match the CUDA toolkit available on your system. The versions in pyproject.toml target CUDA 12.

Quickstart

Below is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted genotypes and BED-formatted phenotypes. The example mirrors the data layout expected by tensorQTL, so existing preprocessing pipelines can be reused.

Standard mapping (tensorQTL equivalent)

from localqtl import PlinkReader, read_phenotype_bed
from localqtl.cis import map_nominal

# Load genotypes and variant metadata
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]

# Load phenotypes (BED-style) and their genomic coordinates
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")

# Optional: load covariates as a DataFrame indexed by sample IDs
covariates_df = None

results = map_nominal(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    window=1_000_000,         # ±1 Mb cis window
    maf_threshold=0.01,       # filter on in-sample MAF
    device="auto",            # picks CUDA when available, otherwise CPU
    out_prefix="cis_nominal", # default prefix
    return_df=True            # default is False, parquet streamed sink
)

print(results.head())

Covariate interaction scans

map_nominal, map_permutations, map_independent, and CisMapper also support sample-level genotype-by-covariate interaction models through the interaction_covariate argument. Pass either a covariate-column name, a pandas.Series indexed by sample ID, or a 1D array aligned to the sample order.

from localqtl.cis import map_nominal

results = map_nominal(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    interaction_covariate="age",
    device="auto",
    out_dir=None,
)

print(results[["variant_id", "pval_nominal", "pval_g", "pval_i", "pval_gi"]].head())

This emits the standard nominal columns plus canonical joint-model statistics for y ~ g + i + g:i: b_g, b_i, b_gi, their standard errors, t-stats, and p-values. interaction_covariate cannot be combined with ancestry_model="interaction".

For analyses that combine nominal scans, permutations, and independent signal calling, the CisMapper class offers a thin object-oriented façade:

from localqtl.cis import CisMapper

mapper = CisMapper(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    window=500_000,
    maf_threshold=0.01,
    out_dir="./",
    out_prefix="cis_nominal",
)

mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)

Local ancestry-aware mapping

localQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass haplotypes (per-variant × per-sample × per-ancestry dosages) and loci_df (variant positions corresponding to the haplotype tensor) to CisMapper. When these are provided, CisMapper switches to the ancestry-aware input generator under the hood.

from localqtl import PlinkReader, read_phenotype_bed
from localqtl.haplotypeio import RFMixReader
from localqtl.cis import CisMapper

plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()  # samples as columns, variants as rows
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]

phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
covariates_df = None  # optional

# Local ancestry from RFMix (and align to genotype samples)
select_samples = genotype_df.columns.values
rfmix = RFMixReader(
    prefix_path="data/rfmix/prefix",   # directory with per-chrom outputs + fb.tsv
    binary_path="data/rfmix",          # where prebuilt binaries live (if used)
    select_samples=select_samples,     # optional for automatic sample alignment (recommended)
    verbose=True
)

# Materialize ancestry haplotypes into memory (NumPy array)
# Shape: (variants, samples, ancestries) for >2 ancestries.
# For 2 ancestries, the reader exposes the ancestry channel selected internally.
H = rfmix.load_haplotypes()
loci_df = rfmix.loci_df  # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')

# (Optional) Ensure chromosome dtype matches between variant_df and loci_df
variant_df["chrom"] = variant_df["chrom"].astype(str)
loci_df = loci_df.copy()
loci_df["chrom"] = loci_df["chrom"].astype(str)

# Ancestry-aware mapping
mapper = CisMapper(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    haplotypes=H,          # <-- enable local ancestry-aware mode
    loci_df=loci_df,       # <-- positions that align to H
    window=1_000_000,
    maf_threshold=0.01,
    device="auto",
    out_dir="./",
    out_prefix="cis_nominal",
)

# Run nominal scans and permutations as usual; ancestry-awareness is automatic
mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)

# FDR using the pure-Python q-value port (no R/rpy2 needed)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)

# Identify conditionally independent signals
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)

print(lead_df.head())

Fine-mapping and colocalization

The branch also adds SuSiE fine-mapping and COLOC colocalization helpers for downstream interpretation of cis hits:

from localqtl import coloc, susie

susie_summary = susie.map(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    device="auto",
)

coloc_df = coloc.run_pairs(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype1_df=phenotype_df,
    phenotype2_df=external_trait_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates1_df=covariates_df,
    device="auto",
)

susie.map() returns a summary table of credible-set membership with per-variant PIPs. coloc.run_pairs() returns one row per phenotype with posterior support for hypotheses H0 through H4, including shared-signal support in pp_h4_abf.

To annotate downstream cis outputs:

from localqtl.cis.postproc import annotate_with_coloc, annotate_with_susie

nominal_annotated = annotate_with_susie(results, susie_summary)
perm_annotated = annotate_with_coloc(perm_df, coloc_df)

Input expectations

  • H (from RFMixReader.load_haplotypes()) is ancestry dosages with shape (variants, samples, ancestries) for ≥3 ancestries; for 2 ancestries the reader exposes the configured channel.
  • loci_df rows correspond 1:1 to H's and be joinable (by chrom/pos) to variant_df used for genotypes.
  • Sample order in genotype_df, phenotype_df, and covariates_df should match H (reindex as shown).
  • The cis-mapping helpers default to preload_haplotypes=True so ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.

Testing

Run the test suite with:

poetry run pytest

This exercises the core cis-QTL mapping routines using small synthetic datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localqtl-0.1.4.tar.gz (82.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localqtl-0.1.4-py3-none-any.whl (91.4 kB view details)

Uploaded Python 3

File details

Details for the file localqtl-0.1.4.tar.gz.

File metadata

  • Download URL: localqtl-0.1.4.tar.gz
  • Upload date:
  • Size: 82.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for localqtl-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d7f1642d62f267c376a53a316402198a68f90bdb2bacb269951fa3ef37e5a2bc
MD5 6d7a3be2ea1ba19e9b2e4de32ad8ac67
BLAKE2b-256 a125839dedd31085f403c41acb85f28994d08c800162515b4339a234798ca554

See more details on using hashes here.

File details

Details for the file localqtl-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: localqtl-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 91.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for localqtl-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 c60d2e41a47f0f06dfe1c1f2a4704313ebb41bf29ff9114b1aac44b4c743b84f
MD5 3258ef92fb5f812a45777e2304257533
BLAKE2b-256 f57f659e22873fcecf218e094341efaabd9036e453b7c8fbb07a27324df85a79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page