Skip to main content

GPU-accelerated ancestry-aware cis-QTL mapping library with tensorQTL-compatible APIs

Project description

localQTL

localQTL is a pure-Python library for local-ancestry-aware xQTL mapping that lets researchers run end-to-end analyses on large cohorts without any R/rpy2 dependencies. It preserves the familiar tensorQTL data model with GPU-first execution paths, flexible genotype loaders, and streaming outputs for large-scale workflows, while adding ancestry-aware use cases.

Features

  • GPU-accelerated workflows powered by PyTorch, with automatic fallbacks to CPU execution when CUDA is unavailable.
  • Modular cis-QTL mapping API exposed through functional helpers such as map_nominal, map_permutations, and map_independent, or via the convenience wrapper CisMapper.
  • Multiple genotype backends including PLINK (BED/BIM/FAM), PLINK 2 (PGEN/PSAM/PVAR), and BED/Parquet inputs, with helpers to stream data in manageable windows.
  • Local ancestry integration by pairing genotype windows with haplotype panels produced by RFMix through RFMixReader and InputGeneratorCisWithHaps.
  • Parquet streaming sinks that make it easy to materialise association statistics without loading the entire result set in memory.
  • Pure-Python statistics (no R/rpy2 required): tensorQTL's rfunc calls have been refactored to scipy.stats for p-values and q-values are computed with the Python port of Storey's qvalue (py-qvalue).

Installation

Install the latest release from PyPI:

pip install localqtl

The project uses Poetry for dependency management. Clone the repository and install the package into a virtual environment:

poetry install

If you prefer pip, you can install the library in editable mode after exporting Poetry's dependency specification:

pip install -e .

Note: GPU acceleration relies on PyTorch, CuPy, and cuDF. Make sure you use versions that match the CUDA toolkit available on your system. The versions in pyproject.toml target CUDA 12.

Quickstart

Below is a minimal example that runs a nominal cis-QTL scan against PLINK-formatted genotypes and BED-formatted phenotypes. The example mirrors the data layout expected by tensorQTL, so existing preprocessing pipelines can be reused.

Standard mapping (tensorQTL equivalent)

from localqtl import PlinkReader, read_phenotype_bed
from localqtl.cis import map_nominal

# Load genotypes and variant metadata
plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]

# Load phenotypes (BED-style) and their genomic coordinates
phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")

# Optional: load covariates as a DataFrame indexed by sample IDs
covariates_df = None

results = map_nominal(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    window=1_000_000,         # ±1 Mb cis window
    maf_threshold=0.01,       # filter on in-sample MAF
    device="auto",            # picks CUDA when available, otherwise CPU
    out_prefix="cis_nominal", # default prefix
    return_df=True            # default is False, parquet streamed sink
)

print(results.head())

For analyses that combine nominal scans, permutations, and independent signal calling, the CisMapper class offers a thin object-oriented façade:

from localqtl.cis import CisMapper

mapper = CisMapper(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    window=500_000,
    maf_threshold=0.01,
    out_dir="./",
    out_prefix="cis_nominal",
)

mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)

Local ancestry-aware mapping

localQTL can incorporate local ancestry (e.g., from RFMix) so that cis-xQTL tests are performed with ancestry-aware genotype inputs. To enable this, pass haplotypes (per-variant × per-sample × per-ancestry dosages) and loci_df (variant positions corresponding to the haplotype tensor) to CisMapper. When these are provided, CisMapper switches to the ancestry-aware input generator under the hood.

from localqtl import PlinkReader, read_phenotype_bed
from localqtl.haplotypeio import RFMixReader
from localqtl.cis import CisMapper

plink = PlinkReader("data/genotypes")
genotype_df = plink.load_genotypes()  # samples as columns, variants as rows
variant_df = plink.bim.set_index("snp")[["chrom", "pos"]]

phenotype_df, phenotype_pos_df = read_phenotype_bed("data/phenotypes.bed")
covariates_df = None  # optional

# Local ancestry from RFMix (and align to genotype samples)
select_samples = genotype_df.columns.values
rfmix = RFMixReader(
    prefix_path="data/rfmix/prefix",   # directory with per-chrom outputs + fb.tsv
    binary_path="data/rfmix",          # where prebuilt binaries live (if used)
    select_samples=select_samples,     # optional for automatic sample alignment (recommended)
    verbose=True
)

# Materialize ancestry haplotypes into memory (NumPy array)
# Shape: (variants, samples, ancestries) for >2 ancestries.
# For 2 ancestries, the reader exposes the ancestry channel selected internally.
H = rfmix.load_haplotypes()
loci_df = rfmix.loci_df  # DataFrame with ['chrom','pos', 'ancestry', 'hap', 'index'] (indexed by 'hap')

# (Optional) Ensure chromosome dtype matches between variant_df and loci_df
variant_df["chrom"] = variant_df["chrom"].astype(str)
loci_df = loci_df.copy()
loci_df["chrom"] = loci_df["chrom"].astype(str)

# Ancestry-aware mapping
mapper = CisMapper(
    genotype_df=genotype_df,
    variant_df=variant_df,
    phenotype_df=phenotype_df,
    phenotype_pos_df=phenotype_pos_df,
    covariates_df=covariates_df,
    haplotypes=H,          # <-- enable local ancestry-aware mode
    loci_df=loci_df,       # <-- positions that align to H
    window=1_000_000,
    maf_threshold=0.01,
    device="auto",
    out_dir="./",
    out_prefix="cis_nominal",
)

# Run nominal scans and permutations as usual; ancestry-awareness is automatic
mapper.map_nominal(nperm=0)
perm_df = mapper.map_permutations(nperm=1_000, beta_approx=True)

# FDR using the pure-Python q-value port (no R/rpy2 needed)
perm_df = mapper.calculate_qvalues(perm_df, fdr=0.05)

# Identify conditionally independent signals
lead_df = mapper.map_independent(cis_df=perm_df, fdr=0.05)

print(lead_df.head())

Input expectations

  • H (from RFMixReader.load_haplotypes()) is ancestry dosages with shape (variants, samples, ancestries) for ≥3 ancestries; for 2 ancestries the reader exposes the configured channel.
  • loci_df rows correspond 1:1 to H's and be joinable (by chrom/pos) to variant_df used for genotypes.
  • Sample order in genotype_df, phenotype_df, and covariates_df should match H (reindex as shown).
  • The cis-mapping helpers default to preload_haplotypes=True so ancestry blocks are staged as contiguous tensors on the requested device (GPU or CPU). Override this flag when working under strict memory constraints.

Testing

Run the test suite with:

poetry run pytest

This exercises the core cis-QTL mapping routines using small synthetic datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

localqtl-0.1.3.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

localqtl-0.1.3-py3-none-any.whl (63.1 kB view details)

Uploaded Python 3

File details

Details for the file localqtl-0.1.3.tar.gz.

File metadata

  • Download URL: localqtl-0.1.3.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for localqtl-0.1.3.tar.gz
Algorithm Hash digest
SHA256 18a802d0ee659a04c707be2a36521af2e994504847fd8709a318f239a6c06e4b
MD5 854e627a7e1dc2d9089d82a1fbf13d01
BLAKE2b-256 678e448db7ed4ad791816179648fccf7fb56fda2d390f5cce15490cbdd294939

See more details on using hashes here.

File details

Details for the file localqtl-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: localqtl-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 63.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.0 CPython/3.10.9 Linux/4.18.0-553.22.1.el8_10.x86_64

File hashes

Hashes for localqtl-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 073aef53575e68db2a16fa4224255fa00468f5084c2232b5a23ba820a056b71f
MD5 debdc6cbcdf10513540c47a0f858ce1c
BLAKE2b-256 3f80494df8559751b4ab3e89eb565bca11cd12aa93423e00f77c48e5825fbbb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page