Skip to main content

Variant filtering and GWAS analysis tool for plant genomics.

Project description

PlantVarFilter

PlantVarFilter is a Python toolkit designed for efficient filtering and annotation of plant genomic variants, enabling researchers to link genetic variants with phenotypic traits and perform preliminary genome-wide association studies (GWAS). It addresses challenges in handling large variant datasets and supports integrative analysis combining genomic and trait data.

⚠️ Requires Python 3.12+
Current Version: 0.1.0 — This is the first stable release.
Future releases aim to introduce advanced statistical models, automated reports, and interactive visualizations for plant genomics research.


Citations:

This tool is described in the following preprint:

Ahmed Yassin (2025). PlantVarFilter: A flexible tool for variant filtering and multi-trait GWAS analysis in plants. bioRxiv.
[https://doi.org/10.1101/2025.07.02.662805]

Please cite this work if you use PlantVarFilter in your research.

Features:

  • Filter variants by consequence type (e.g., missense_variant, stop_gained, synonymous_variant, frameshift_variant).
  • Include or exclude intergenic regions.
  • Annotate variants with gene information from GFF3 files.
  • Link genes with trait scores from CSV/TSV files.
  • Perform basic GWAS analyses using t-tests and multiple linear regression.
  • Generate summary plots including variant consequence distribution, variant type proportions, and Manhattan plots.
  • Support for compressed input files (.gz).
  • Configurable output formats: CSV, TSV, JSON, XLSX, Feather.

Project Structure

PlantVarFilter/
├── src/
│   └── plantvarfilter/
│       ├── __init__.py
│       ├── annotator.py
│       ├── cli.py
│       ├── filter.py
│       ├── parser.py
│       ├── regression_gwas.py
│       └── visualize.py
├── setup.py
├── README.md
└── LICENSE

Installation

pip install .

Make sure you have the following dependencies installed:

pandas, pyarrow, scipy, seaborn, matplotlib, numpy, scikit-learn

We recommend using a Python virtual environment:

python3 -m venv env
source env/bin/activate  # Linux/macOS
env\Scripts\activate   # Windows
pip install .

Usage

Initialize a new analysis project

plantvarfilter init /path/to/project

This creates the following structure:

  • input/ — for your input data files (VCF, GFF3, trait CSV)
  • output/ — for result files and plots
  • config.json — template configuration file

Run the full analysis pipeline

plantvarfilter run --config /path/to/project/config.json

Pipeline steps:

  • Filter variants based on consequence types
  • Annotate variants with genes
  • Annotate variants with trait data
  • Perform GWAS analysis (if enabled)
  • Generate output files and plots

Generate plots from existing GWAS results

plantvarfilter plot-only --config /path/to/project/config.json

Requires the config file to include:

{
  "plot_only": true,
  "output_dir": "output/",
  "gwas_results": "output/gwas_basic_results.csv"
}

Requires config to include:

{
  "plot_only": true,
  "output_dir": "output/",
  "gwas_results": "output/gwas_basic_results.csv"
}

Configuration File Example (config.json)

{
  "vcf": "input/data.vcf.gz",
  "gff": "input/annotation.gff3.gz",
  "traits": "input/traits.csv",
  "include_intergenic": true,
  "consequence_types": [
    "missense_variant",
    "stop_gained",
    "synonymous_variant"
  ],
  "output_format": "csv",
  "output_dir": "output/",
  "plot": true,
  "gwas": true
}

Output Files

  • filtered_variants.csv — Filtered and annotated variant dataset.
  • gwas_basic_results.csv — GWAS association results with p-values.
  • plots/ directory contains:
    • consequence_distribution.png
    • variant_type_pie.png
    • manhattan_plot.png
    • manhattan_plot_from_file.png (for plot-only mode)
  • run.log — Execution log.

Example Experiment Walkthrough

# Step 1: Create the project folder structure
plantvarfilter init ~/Desktop/PlantTestRun

# Step 2: Place your prepared input files in the input folder:
#   - expanded_variants.vcf.gz
#   - expanded_annotations.gff3.gz
#   - expanded_traits.csv

# Step 3: Update the config.json as:

{
  "vcf": "input/expanded_variants.vcf.gz",
  "gff": "input/expanded_annotations.gff3.gz",
  "traits": "input/expanded_traits.csv",
  "include_intergenic": true,
  "consequence_types": ["MODERATE", "HIGH", "LOW", "MODIFIER"],
  "output_format": "csv",
  "output": "output/filtered_variants.csv",
  "plot": true,
  "gwas": true,
  "output_dir": "output/"
}

# Step 4: Run the full pipeline
plantvarfilter run --config ~/Desktop/PlantTestRun/config.json

# Step 5 (optional): If you only want to regenerate the Manhattan Plot from a modified GWAS CSV
{
  "plot_only": true,
  "output_dir": "output/",
  "gwas_results": "output/gwas_basic_results.csv"
}

plantvarfilter plot-only --config ~/Desktop/PlantTestRun/config.json

Output Example

Consequence Distribution

Consequence Plot

Variant Type Pie

Variant Pie

Manhattan Plot

Manhattan Plot


Future Enhancements

  • Support for advanced GWAS models
  • Auto-generated PDF/HTML reports
  • Interactive Streamlit-based UI
  • REST API
  • Unit testing and test datasets

License

MIT License. See LICENSE for details.


Author

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plantvarfilter-0.1.0.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plantvarfilter-0.1.0-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file plantvarfilter-0.1.0.tar.gz.

File metadata

  • Download URL: plantvarfilter-0.1.0.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for plantvarfilter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6b1a266a6e2b02391a110f0de13049d571f9a9b7ec73624357773a7f1e86eb4f
MD5 a786a4937c6670935364c9121d9f7909
BLAKE2b-256 47d1699ebe62d62aa9e4224cf6b5c8b728d2418e79734a4407ac87b0928b8245

See more details on using hashes here.

File details

Details for the file plantvarfilter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: plantvarfilter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for plantvarfilter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e68e29264b8fa4b8e1c72784891f426c9811c01108f766cf5dd6012d9619c5fd
MD5 04c77a774db8e2b70149acc6408021d4
BLAKE2b-256 02ea5c252cd210e331d7df62e825b07868694be68b852777b893afd68ad0ac1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page