Variant filtering and GWAS analysis tool for plant genomics.
Project description
PlantVarFilter
PlantVarFilter is a Python toolkit designed for efficient filtering and annotation of plant genomic variants, enabling researchers to link genetic variants with phenotypic traits and perform preliminary genome-wide association studies (GWAS). It addresses challenges in handling large variant datasets and supports integrative analysis combining genomic and trait data.
⚠️ Requires Python 3.12+
Current Version: 0.1.0 — This is the first stable release.
Future releases aim to introduce advanced statistical models, automated reports, and interactive visualizations for plant genomics research.
Citations:
This tool is described in the following preprint:
Ahmed Yassin (2025). PlantVarFilter: A flexible tool for variant filtering and multi-trait GWAS analysis in plants. bioRxiv.
[https://doi.org/10.1101/2025.07.02.662805]
Please cite this work if you use PlantVarFilter in your research.
Features:
- Filter variants by consequence type (e.g., missense_variant, stop_gained, synonymous_variant, frameshift_variant).
- Include or exclude intergenic regions.
- Annotate variants with gene information from GFF3 files.
- Link genes with trait scores from CSV/TSV files.
- Perform basic GWAS analyses using t-tests and multiple linear regression.
- Generate summary plots including variant consequence distribution, variant type proportions, and Manhattan plots.
- Support for compressed input files (
.gz). - Configurable output formats: CSV, TSV, JSON, XLSX, Feather.
Project Structure
PlantVarFilter/
├── src/
│ └── plantvarfilter/
│ ├── __init__.py
│ ├── annotator.py
│ ├── cli.py
│ ├── filter.py
│ ├── parser.py
│ ├── regression_gwas.py
│ └── visualize.py
├── setup.py
├── README.md
└── LICENSE
Installation
pip install .
Make sure you have the following dependencies installed:
pandas, pyarrow, scipy, seaborn, matplotlib, numpy, scikit-learn
We recommend using a Python virtual environment:
python3 -m venv env
source env/bin/activate # Linux/macOS
env\Scripts\activate # Windows
pip install .
Usage
Initialize a new analysis project
plantvarfilter init /path/to/project
This creates the following structure:
input/— for your input data files (VCF, GFF3, trait CSV)output/— for result files and plotsconfig.json— template configuration file
Run the full analysis pipeline
plantvarfilter run --config /path/to/project/config.json
Pipeline steps:
- Filter variants based on consequence types
- Annotate variants with genes
- Annotate variants with trait data
- Perform GWAS analysis (if enabled)
- Generate output files and plots
Generate plots from existing GWAS results
plantvarfilter plot-only --config /path/to/project/config.json
Requires the config file to include:
{
"plot_only": true,
"output_dir": "output/",
"gwas_results": "output/gwas_basic_results.csv"
}
Requires config to include:
{
"plot_only": true,
"output_dir": "output/",
"gwas_results": "output/gwas_basic_results.csv"
}
Configuration File Example (config.json)
{
"vcf": "input/data.vcf.gz",
"gff": "input/annotation.gff3.gz",
"traits": "input/traits.csv",
"include_intergenic": true,
"consequence_types": [
"missense_variant",
"stop_gained",
"synonymous_variant"
],
"output_format": "csv",
"output_dir": "output/",
"plot": true,
"gwas": true
}
Output Files
filtered_variants.csv— Filtered and annotated variant dataset.gwas_basic_results.csv— GWAS association results with p-values.plots/directory contains:consequence_distribution.pngvariant_type_pie.pngmanhattan_plot.pngmanhattan_plot_from_file.png(forplot-onlymode)
run.log— Execution log.
Example Experiment Walkthrough
# Step 1: Create the project folder structure
plantvarfilter init ~/Desktop/PlantTestRun
# Step 2: Place your prepared input files in the input folder:
# - expanded_variants.vcf.gz
# - expanded_annotations.gff3.gz
# - expanded_traits.csv
# Step 3: Update the config.json as:
{
"vcf": "input/expanded_variants.vcf.gz",
"gff": "input/expanded_annotations.gff3.gz",
"traits": "input/expanded_traits.csv",
"include_intergenic": true,
"consequence_types": ["MODERATE", "HIGH", "LOW", "MODIFIER"],
"output_format": "csv",
"output": "output/filtered_variants.csv",
"plot": true,
"gwas": true,
"output_dir": "output/"
}
# Step 4: Run the full pipeline
plantvarfilter run --config ~/Desktop/PlantTestRun/config.json
# Step 5 (optional): If you only want to regenerate the Manhattan Plot from a modified GWAS CSV
{
"plot_only": true,
"output_dir": "output/",
"gwas_results": "output/gwas_basic_results.csv"
}
plantvarfilter plot-only --config ~/Desktop/PlantTestRun/config.json
Output Example
Consequence Distribution
Variant Type Pie
Manhattan Plot
Future Enhancements
- Support for advanced GWAS models
- Auto-generated PDF/HTML reports
- Interactive Streamlit-based UI
- REST API
- Unit testing and test datasets
License
MIT License. See LICENSE for details.
Author
- Ahmed Yassin || Computational Biologist
- ahmedyassin300@outlook.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plantvarfilter-0.1.0.tar.gz.
File metadata
- Download URL: plantvarfilter-0.1.0.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b1a266a6e2b02391a110f0de13049d571f9a9b7ec73624357773a7f1e86eb4f
|
|
| MD5 |
a786a4937c6670935364c9121d9f7909
|
|
| BLAKE2b-256 |
47d1699ebe62d62aa9e4224cf6b5c8b728d2418e79734a4407ac87b0928b8245
|
File details
Details for the file plantvarfilter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: plantvarfilter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e68e29264b8fa4b8e1c72784891f426c9811c01108f766cf5dd6012d9619c5fd
|
|
| MD5 |
04c77a774db8e2b70149acc6408021d4
|
|
| BLAKE2b-256 |
02ea5c252cd210e331d7df62e825b07868694be68b852777b893afd68ad0ac1b
|