Add a short description here!

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nmancuso

These details have not been verified by PyPI

Project description

jaxQTL

jaxQTL is a scalable software for large-scale eQTL mapping using count-based models!

We present jaxQTL for single-cell eQTL mapping using highly efficient count-based model (i.e., negative binomial or Poisson). Our software is implemented using Just-in-time (JIT) via JAX in Python, which generates and compiles heavily optimized C++ code in real time and operates seamlessly on CPU, GPU or TPU. jaxQTL is a command line tool. Please see example below and full documentation.

For preprint, please see:

Zhang, Z., Kim, A., Suboc, N., Mancuso, N., and Gazal, S. (2025). Efficient count-based models improve power and robustness for large-scale single-cell eQTL mapping. medRxiv (https://www.medrxiv.org/content/10.1101/2025.01.18.25320755v2)

We are currently working on more detailed documentations. Feel free to contact me (zzhang39@usc.edu) if you need help on running our tool and further analysis.

sc-eQTL model

jaxQTL requires pseudobulking by sum for each annotated cell type from the single-cell data. For a focal gene in a given cell type, jaxQTL can fit a count-based model, either Poisson or negative binomial (NB) model, between gene expression read count and a SNP variant as:

$$\log(E(y_i | X_i, g_i, l_i)) = g_i \beta_{g} + X_i \beta + \log(l_i)$$

where $i$ denotes individual $i$, $y_i$ is the pseudobulk read count for a given gene, $g_i$ is the genotype of one SNP variant (e.g., allele count of alternative alleles), $X_i$ is the covariate vector (e.g., age, sex, genotype PCs, expression PCs), $l_i$ is the total read count for individual $i$ across all genes within this cell type, i.e., library size.

To account for overdispersion observed in single-cell count data, jaxQTL modeled the conditional variance as

$$Var(y_i | X_i, g_i, L_i) = \mu_i + \alpha \mu_i^2$$

where $\mu_i = E(y_i | X_i, g_i, l_i)$ and $\alpha$ is the overdispersion parameter in NB model. When $\alpha=0$, NB reduced to Poisson model.

For cis-eQTL mapping, we focus on estimating the SNP effect size, its standard error under specified model, and the test statistics for $\beta_g \ne 0$. While both the Wald test and score test are available in jaxQTL, we recommend using the score test to assess the nonzero cis-SNP effect $g$ on read counts $y$ for its improved computational efficiency.

Interpretation on genetic effect

Compared to linear model applied to normalized read counts, count-based model provides interpretation on the original count data scale. The effect sizes estimated by Poisson or NB model reflect a change in the transcription rate or proportion if including library size offsets.

Installation

Install the latest release from PyPI:

pip install jaxqtl

With uv, either add jaxqtl to a project:

uv add jaxqtl

or install it into the active virtual environment:

uv pip install jaxqtl

After installation, the command line interface is available as jaxqtl:

jaxqtl --help

Development install

To work from the repository, set up a local environment with uv:

git clone https://github.com/mancusolab/jaxqtl.git
cd jaxqtl

uv sync --frozen --extra dev

# (optional) install docs dependencies
uv sync --frozen --extra docs

# build docs locally
uv run zensical build --strict --clean
rm -rf site

Example

This section uses the example files in tutorial/input/ to run the current CLI on a small chr22/CD4_NC dataset. The examples below focus on 10 genes from the down-sampled OneK1K data (N=100).

input format

Four inputs are relevant for the examples below: genotypes, phenotypes, covariates, and an optional gene list.

Phenotypes: BED-like table. The first four columns must encode chromosome, start, end, and phenotype ID. The loader accepts common aliases such as #Chr and Geneid; tutorial/input/CD4_NC.N100.bed.gz is a working example.
Covariates: tab-delimited table with exactly one IID-like column such as iid or #iid. tutorial/input/donor_features.tsv matches the current reader.
Genotypes: PLINK1 BED/BIM/FAM with --bfile, PLINK2 PGEN/PVAR/PSAM with --pfile, indexed VCF/BCF with --vcf, or BGEN with --bgen. The examples below use the PLINK1 prefix tutorial/input/chr22_N100.
Gene list: optional single-column file with one gene ID per line. tutorial/input/genelist_10 restricts the run to 10 genes.

Important note for the phenotype file:
In order to adjust for library size correctly in count-based models, pass one of the following:

--set-offset-from-libsize when the phenotype file still contains the genes needed to compute library size on the fly
--offset ./tutorial/input/CD4_NC.N100.offset.tsv when you want to use a precomputed offset file

run jaxQTL in command line

The current CLI uses subcommands rather than a single --mode flag. The examples below show the direct interface that matches src/jaxqtl/cli.py.

To compute expression PCs ahead of time, use the dedicated compute-pcs command:

jaxqtl compute-pcs \
 --pheno ./tutorial/input/CD4_NC.N100.bed.gz \
 --covar ./tutorial/input/donor_features.tsv \
 --num-pcs 2 \
 --out ./tutorial/output/CD4_NC.N100.covar_with_expr_pcs.tsv

This writes a covariate table with ExprPC0, ExprPC1, ... appended to the original covariates. Use that file as --covar if you want expression PCs in the downstream scan.

For fast cis-eQTL mapping with permutation calibration:

data_path="./tutorial/input"
out_prefix="./tutorial/output/CD4_NC_chr22_genelist_10_jaxqtl_nb"

jaxqtl cis \
 --bfile "${data_path}/chr22_N100" \
 --covar "${data_path}/donor_features.tsv" \
 --pheno "${data_path}/CD4_NC.N100.bed.gz" \
 --gene-list "${data_path}/genelist_10" \
 --model nb \
 --test score \
 --set-offset-from-libsize \
 --normalize-covar \
 --nperm 1000 \
 --out "${out_prefix}"

For faster cis-eQTL mapping with SPA tail calibration and ACAT gene-level aggregation:

data_path="./tutorial/input"
out_prefix="./tutorial/output/CD4_NC_chr22_genelist_10_jaxqtl_nb"

jaxqtl cis \
 --bfile "${data_path}/chr22_N100" \
 --covar "${data_path}/donor_features.tsv" \
 --pheno "${data_path}/CD4_NC.N100.bed.gz" \
 --gene-list "${data_path}/genelist_10" \
 --model nb \
 --test score \
 --spa \
 --acat \
 --set-offset-from-libsize \
 --normalize-covar \
 --out "${out_prefix}"

This writes ./tutorial/output/CD4_NC_chr22_genelist_10_jaxqtl_nb.cis.score.spa.acat.parquet.gz.

For a nominal cis scan over the same genes:

data_path="./tutorial/input"
out_prefix="./tutorial/output/CD4_NC_chr22_genelist_10_jaxqtl_nb"

jaxqtl nominal \
 --bfile "${data_path}/chr22_N100" \
 --covar "${data_path}/donor_features.tsv" \
 --pheno "${data_path}/CD4_NC.N100.bed.gz" \
 --gene-list "${data_path}/genelist_10" \
 --model nb \
 --test wald \
 --set-offset-from-libsize \
 --normalize-covar \
 --out "${out_prefix}"

For all available flags, please use jaxqtl -h.

output format

See example outputs in ./tutorial/output. The current CLI writes Parquet files:

jaxqtl cis ... writes ${out}.cis.{test}.{perm|acat}.parquet.gz
jaxqtl nominal ... writes ${out}.nominal.{test}.parquet.gz
jaxqtl trans ... writes ${out}.trans.{test}.variant.info.parquet.gz and ${out}.trans.{test}.sumstats.parquet.gz

The cis output contains:

phenotype_id, chrom, num_var, snp, a1, a0, pos, tss_distance, af, ma_count
beta, se, pvalue, pvalue_adj, adj_method, nb_alpha, model_converged
shape1, shape2, nc_estimate, perm_converged when using permutation-based calibration rather than --acat

The nominal output contains per-variant statistics for each tested gene:

phenotype_id, chrom, snp, pos, a1, a0, tss_distance, af, ma_count
beta, se, pvalue, nb_alpha, model_converged

Genome-wide sc-eQTL mapping

General advice

To efficiently run sc-eQTL mapping genome-wide using jaxQTL count-based models especially for large cohort (N>1000), we recommend run jaxQTL by chromosome and break genes (total ~20,000) into chunks, with chunk size of 200 - 300 genes. For our cis-eQTL analysis on OneK1K (max N=982), the run time for each chunk of 50 genes is around 1 -2 hrs depending on which chromosome (chr6 usually takes longer). In this way, we can distribute the tasks over multiple nodes in parallel using job scheduler on HPC. You may consider increase the chunk size to maximize efficiency on CPU node or even more if using GPU (50 genes takes ~20 mins).

If sample size is small (e.g., N=200), you may consider not splitting by chunks because it's fast enough. We recommend trying a few genes and check the log file to estimate the run time.

The repository includes ./tutorial/code/run_jaxqtl_cis_all.sh as a batch-workflow template. Use the current jaxqtl cis ... invocation shown above inside your scheduler script. Please see below for detailed instructions:

1. Organize directory structure

Suppose we are located in the working directory of a project. Here is an example of directory structure:

.
├── code/
├── data/
│   ├── features/
│   │   └── donor_features.tsv
│   ├── geno/
│   │   ├── chr1.bed
│   │   ├── chr1.bim
│   │   └── chr1.fam
│   │   └── ...
│   └── pheno/
│       ├── B_IN.bed.gz
│       ├── CD4_NC.bed.gz
│       └── CD8_NC.bed.gz
│       └── ...
└── result/
    └── cis/

2. Create gene list chunks and output directories

Assuming the directory structure as above, we provide a script for creating

gene list directories and gene list files /data/genelist/${celltype}/chr{chr_idx}/*, for example:

./data/genelist/CD4_NC/chr1
├── chunk_1
├── chunk_2
├── chunk_3
...

a parameter file for the example shell script to distribute jobs

CD4_NC	1	chunk_1
CD4_NC	1	chunk_2
....
CD4_NC	2	chunk_1
CD4_NC	2	chunk_2
...

result directories for the corresponding cell type and chromosomes /result/cis/${celltype}/chr{chr_idx}/*

./result/cis/CD4_NC/
├── chr1/
├── chr2/
├── chr3/
...

The repository also includes ./tutorial/code/create_genelist_dir.R as one way to generate these directories. Then run Rscript create_genelist_dir.R

3. Run jaxQTL on HPC using array jobs

We used slurm job schedule on our HPC. An example sbatch script can be found in ./tutorial/code/run_jaxqtl_cis_all.sh
To submit jobs, use sbatch run_jaxqtl_cis_all.sh

4. Collect results

After all cis-eQTL mapping are completed, you can prepare results for analysis by:

combine all chunk results from one cell type into one single file
filter by converged GLM model model_converged > 0 and, when using permutation-based calibration, converged adjustment fits perm_converged > 0
calculate FDR-controlled pvalues on pvalue_adj using qvalue method in R
identify eGenes (genes with at least one eQTL) by qvalue < FDR level, e.g., 0.05

Note

This project has been set up using PyScaffold 4.4. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nmancuso

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jaxqtl-0.1.0.tar.gz (49.5 kB view details)

Uploaded Jun 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jaxqtl-0.1.0-py3-none-any.whl (62.0 kB view details)

Uploaded Jun 24, 2026 Python 3

File details

Details for the file jaxqtl-0.1.0.tar.gz.

File metadata

Download URL: jaxqtl-0.1.0.tar.gz
Upload date: Jun 24, 2026
Size: 49.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for jaxqtl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`eb6f3ac0f56378fa631377cd94d1cc25e25d4d8a5b85d4ff711b7681891307fa`
MD5	`22aae00b8859606875950d4d6617713e`
BLAKE2b-256	`dbb4a71e620844e5cc568d8a8876ce6a006b7d80ec08717c73a6cf04a86cfae8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for jaxqtl-0.1.0.tar.gz:

Publisher: release.yml on mancusolab/jaxqtl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: jaxqtl-0.1.0.tar.gz
- Subject digest: eb6f3ac0f56378fa631377cd94d1cc25e25d4d8a5b85d4ff711b7681891307fa
- Sigstore transparency entry: 1944158336
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: mancusolab/jaxqtl@d9072dbfaefae3bd8ba96a063fe774a4df224fa9
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/mancusolab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d9072dbfaefae3bd8ba96a063fe774a4df224fa9
- Trigger Event: release

File details

Details for the file jaxqtl-0.1.0-py3-none-any.whl.

File metadata

Download URL: jaxqtl-0.1.0-py3-none-any.whl
Upload date: Jun 24, 2026
Size: 62.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for jaxqtl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c5198ad99b732d523892342bc2903bf45ee8d84db97ca1b32b9f3ffe483ab3d`
MD5	`ed016f7c89cabc64dff5f0f199778637`
BLAKE2b-256	`b1aa526ca0872f8e50f8c2be7de26c3f8906eae5d0153571de8a86444ce937f2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for jaxqtl-0.1.0-py3-none-any.whl:

Publisher: release.yml on mancusolab/jaxqtl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: jaxqtl-0.1.0-py3-none-any.whl
- Subject digest: 6c5198ad99b732d523892342bc2903bf45ee8d84db97ca1b32b9f3ffe483ab3d
- Sigstore transparency entry: 1944158447
- Sigstore integration time: Jun 24, 2026
Source repository:
- Permalink: mancusolab/jaxqtl@d9072dbfaefae3bd8ba96a063fe774a4df224fa9
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/mancusolab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d9072dbfaefae3bd8ba96a063fe774a4df224fa9
- Trigger Event: release

jaxqtl 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

jaxQTL

sc-eQTL model

Interpretation on genetic effect

Installation

Development install

Example

input format

run jaxQTL in command line

output format

Genome-wide sc-eQTL mapping

General advice

1. Organize directory structure

2. Create gene list chunks and output directories

3. Run jaxQTL on HPC using array jobs

4. Collect results

Note

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance