A Python package for QTL detection based on machine learning

Project description

🧬 mlQTL: Machine Learning for Quantitative Trait Loci Mapping

mlQTL is a gene-centric machine learning framework for genome-wide QTL detection. It models the relationship between genomic variants and phenotypes at the gene level, capturing nonlinear effects and weak-effect loci. A sliding window strategy aggregates gene-level signals to identify high-confidence QTL regions and prioritize candidate causal variants. mlQTL is released as an open-source Python toolkit for high-throughput, reproducible genetic analysis and molecular breeding research.

⚙️ Features

Gene-level QTL detection: Uses SNPs from any genomic regions within genes to model gene-phenotype associations.
Multiple regression models: Decision Tree, Random Forest, and Support Vector Regression; additional models and encoding schemes can be customized.
Sliding window analysis: Aggregates gene scores into window scores for robust QTL detection.
SNP prioritization: Feature importance scores quantify contributions of individual SNPs for fine-scale variant prioritization. Scalable and efficient: Supports large datasets with multi-process parallelism.
Flexible workflow: Provides command-line interface and Python API with customizable parameters, visualization, and output options. Open-source and reproducible: Available on GitHub with example datasets and documentation.

📦 Installation

We highly recommend using a virtual environment to prevent dependency conflicts.

# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate

Install with pip (Recommended)

Install the latest version directly from PyPI:

pip install mlqtl

Warning As of version 2.3.0, NumPy no longer supports Linux systems with glibc version below 2.28. If you are on an older Linux system, please use one of the following installation methods:

# Force install using a binary wheel for NumPy
pip install mlqtl --only-binary=numpy

# Or, install a compatible version of NumPy before installing mlqtl
pip install numpy==2.2.6 mlqtl

Install from Source

Download the Source Code

# Clone from GitHub
git clone https://github.com/huanglab-cbi/mlqtl.git

# Or download from our website
wget https://cbi.njau.edu.cn/mlqtl/doc/download/source_code.tar.gz

Navigate to the Directory
```
cd mlqtl
```
Install Dependencies
```
pip install -r requirements.txt
```
Install the Package
```
pip install .
```

🚀 Usage

mlQTL requires genotype data in the plink binary format (.bed, .bim, .fam). If your data is in VCF format, you must first convert it using plink.

The primary CLI tool provides several commands:

❯ mlqtl --help
Usage: mlqtl [OPTIONS] COMMAND [ARGS]...

  mlQTL: Machine Learning for QTL Analysis

Options:
  --help  Show this message and exit.

Commands:
  gff2range   Convert GFF3 file to plink gene range format
  gtf2range   Convert GTF file to plink gene range format
  importance  Calculate feature importance and plot bar chart
  run         Run mlQTL analysis

For detailed instructions and API usage, please see the full documentation.

🧪 Example Walkthrough

Step 1: Download Sample Data

Visit the download page to get imputed_base_filtered_v0.7.vcf.gz, gene_location_range.txt, and grain_length.txt. Alternatively, use the following commands to download them:

wget https://cbi.njau.edu.cn/mlqtl/doc/download/imputed_base_filtered_v0.7.vcf.gz
wget https://cbi.njau.edu.cn/mlqtl/doc/download/gene_location_range.txt
wget https://cbi.njau.edu.cn/mlqtl/doc/download/grain_length.txt

Note: The gene_location_range.txt is generated based on the GFF file of the reference genome. For details, please refer to the documentation

Step 2: Preprocess the Data

Convert the VCF file to plink's binary format.

# Define the VCF file variable
vcf=imputed_base_filtered_v0.7.vcf.gz

# Run plink to convert and filter the data
plink --vcf ${vcf} \
      --snps-only \
      --allow-extra-chr \
      --make-bed \
      --double-id \
      --vcf-half-call m \
      --extract range gene_location_range.txt \
      --out imputed

Step 3: Run mlQTL Analysis

1. Run Analysis

mlqtl run -g imputed \
          -p grain_length.txt \
          -r gene_location_range.txt \
          -j 64 \
          -o result

2. Calculate SNP Importance

mlqtl importance -g imputed \
                 -p grain_length.txt \
                 -r gene_location_range.txt \
                 --trait grain_length \
                 --gene Os03g0407400 \
                 -m DecisionTreeRegressor \
                 -o result

📊 Performance Benchmark

The -j option sets the number of parallel processes. Generally, the more processes you use, the shorter the runtime. The following benchmarks were conducted on an AMD EPYC 7543 CPU.

Processes	Memory	Time
1	1.76G	5.5h
2	2.22G	2.5h
4	3.15G	1h
8	5G	35min
16	8.74G	19min
32	16.18G	10min
64	31.04G	6min

Please select an appropriate number of processes based on your system's resources.

Project details

Release history Release notifications | RSS feed

This version

0.2.2

Mar 13, 2026

0.2.1

Mar 13, 2026

0.2.0

Mar 13, 2026

0.1.8

Jan 16, 2026

0.1.6

Jan 16, 2026

0.1.5

Jun 29, 2025

0.1.4

Jun 13, 2025

0.1.3

Jun 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlqtl-0.2.2.tar.gz (26.1 kB view details)

Uploaded Mar 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlqtl-0.2.2-py3-none-any.whl (28.2 kB view details)

Uploaded Mar 13, 2026 Python 3

File details

Details for the file mlqtl-0.2.2.tar.gz.

File metadata

Download URL: mlqtl-0.2.2.tar.gz
Upload date: Mar 13, 2026
Size: 26.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.5

File hashes

Hashes for mlqtl-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`b281d085f7284370a008642b3ce2d54d1e372acb396b6f1540c97637d4356aad`
MD5	`0dd131d5ee5a4014837499a81df6c5d9`
BLAKE2b-256	`9f42c55b53aae21473bacce0e76e4b70bc9456090713b7bc6cf8a0f03229d858`

See more details on using hashes here.

File details

Details for the file mlqtl-0.2.2-py3-none-any.whl.

File metadata

Download URL: mlqtl-0.2.2-py3-none-any.whl
Upload date: Mar 13, 2026
Size: 28.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.5

File hashes

Hashes for mlqtl-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3951000ebf25c2421cc529b37b1393dbe52b10e5231dcde0e3ff47d3f0eafdd`
MD5	`1d455296be42406d1020f6d4b4f2cb1c`
BLAKE2b-256	`f67d9e4c119f115edd35ba67eac57dec411416b5bdf8161dd1274c2e65775503`

See more details on using hashes here.

mlqtl 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

🧬 mlQTL: Machine Learning for Quantitative Trait Loci Mapping

⚙️ Features

📦 Installation

Install with pip (Recommended)

Install from Source

🚀 Usage

🧪 Example Walkthrough

Step 1: Download Sample Data

Step 2: Preprocess the Data

Step 3: Run mlQTL Analysis

📊 Performance Benchmark

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes