A Python package for QTL detection based on machine learning
Project description
🧬 mlQTL: Machine Learning for Quantitative Trait Loci Mapping
mlQTL is a gene-centric machine learning framework for genome-wide QTL detection. It models the relationship between genomic variants and phenotypes at the gene level, capturing nonlinear effects and weak-effect loci. A sliding window strategy aggregates gene-level signals to identify high-confidence QTL regions and prioritize candidate causal variants. mlQTL is released as an open-source Python toolkit for high-throughput, reproducible genetic analysis and molecular breeding research.
⚙️ Features
- Gene-level QTL detection: Uses SNPs from any genomic regions within genes to model gene-phenotype associations.
- Multiple regression models: Decision Tree, Random Forest, and Support Vector Regression; additional models and encoding schemes can be customized.
- Sliding window analysis: Aggregates gene scores into window scores for robust QTL detection.
- SNP prioritization: Feature importance scores quantify contributions of individual SNPs for fine-scale variant prioritization. Scalable and efficient: Supports large datasets with multi-process parallelism.
- Flexible workflow: Provides command-line interface and Python API with customizable parameters, visualization, and output options. Open-source and reproducible: Available on GitHub with example datasets and documentation.
📦 Installation
We highly recommend using a virtual environment to prevent dependency conflicts.
# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate
Install with pip (Recommended)
Install the latest version directly from PyPI:
pip install mlqtl
Warning As of version 2.3.0, NumPy no longer supports Linux systems with
glibcversion below 2.28. If you are on an older Linux system, please use one of the following installation methods:
# Force install using a binary wheel for NumPy
pip install mlqtl --only-binary=numpy
# Or, install a compatible version of NumPy before installing mlqtl
pip install numpy==2.2.6 mlqtl
Install from Source
-
Download the Source Code
# Clone from GitHub git clone https://github.com/huanglab-cbi/mlqtl.git # Or download from our website wget https://cbi.njau.edu.cn/mlqtl/doc/download/source_code.tar.gz
-
Navigate to the Directory
cd mlqtl
-
Install Dependencies
pip install -r requirements.txt
-
Install the Package
pip install .
🚀 Usage
mlQTL requires genotype data in the plink binary format (.bed, .bim, .fam). If your data is in VCF format, you must first convert it using plink.
The primary CLI tool provides several commands:
❯ mlqtl --help
Usage: mlqtl [OPTIONS] COMMAND [ARGS]...
mlQTL: Machine Learning for QTL Analysis
Options:
--help Show this message and exit.
Commands:
gff2range Convert GFF3 file to plink gene range format
gtf2range Convert GTF file to plink gene range format
importance Calculate feature importance and plot bar chart
run Run mlQTL analysis
For detailed instructions and API usage, please see the full documentation.
🧪 Example Walkthrough
Step 1: Download Sample Data
Visit the download page to get imputed_base_filtered_v0.7.vcf.gz, gene_location_range.txt, and grain_length.txt.
Alternatively, use the following commands to download them:
wget https://cbi.njau.edu.cn/mlqtl/doc/download/imputed_base_filtered_v0.7.vcf.gz
wget https://cbi.njau.edu.cn/mlqtl/doc/download/gene_location_range.txt
wget https://cbi.njau.edu.cn/mlqtl/doc/download/grain_length.txt
Note: The
gene_location_range.txtis generated based on the GFF file of the reference genome. For details, please refer to the documentation
Step 2: Preprocess the Data
Convert the VCF file to plink's binary format.
# Define the VCF file variable
vcf=imputed_base_filtered_v0.7.vcf.gz
# Run plink to convert and filter the data
plink --vcf ${vcf} \
--snps-only \
--allow-extra-chr \
--make-bed \
--double-id \
--vcf-half-call m \
--extract range gene_location_range.txt \
--out imputed
Step 3: Run mlQTL Analysis
1. Run Analysis
mlqtl run -g imputed \
-p grain_length.txt \
-r gene_location_range.txt \
-j 64 \
-o result
2. Calculate SNP Importance
mlqtl importance -g imputed \
-p grain_length.txt \
-r gene_location_range.txt \
--trait grain_length \
--gene Os03g0407400 \
-m DecisionTreeRegressor \
-o result
📊 Performance Benchmark
The -j option sets the number of parallel processes. Generally, the more processes you use, the shorter the runtime. The following benchmarks were conducted on an AMD EPYC 7543 CPU.
| Processes | Memory | Time |
|---|---|---|
| 1 | 1.76G | 5.5h |
| 2 | 2.22G | 2.5h |
| 4 | 3.15G | 1h |
| 8 | 5G | 35min |
| 16 | 8.74G | 19min |
| 32 | 16.18G | 10min |
| 64 | 31.04G | 6min |
Please select an appropriate number of processes based on your system's resources.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlqtl-0.2.2.tar.gz.
File metadata
- Download URL: mlqtl-0.2.2.tar.gz
- Upload date:
- Size: 26.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b281d085f7284370a008642b3ce2d54d1e372acb396b6f1540c97637d4356aad
|
|
| MD5 |
0dd131d5ee5a4014837499a81df6c5d9
|
|
| BLAKE2b-256 |
9f42c55b53aae21473bacce0e76e4b70bc9456090713b7bc6cf8a0f03229d858
|
File details
Details for the file mlqtl-0.2.2-py3-none-any.whl.
File metadata
- Download URL: mlqtl-0.2.2-py3-none-any.whl
- Upload date:
- Size: 28.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3951000ebf25c2421cc529b37b1393dbe52b10e5231dcde0e3ff47d3f0eafdd
|
|
| MD5 |
1d455296be42406d1020f6d4b4f2cb1c
|
|
| BLAKE2b-256 |
f67d9e4c119f115edd35ba67eac57dec411416b5bdf8161dd1274c2e65775503
|