Skip to main content

AGVD Variant Query Command Line Tool

Project description

AGVD Variant Query Tool

The AGVD Variant Query Tool is a command-line utility for querying variant information against the African Genome Variation Database (AGVD). It supports input from VCF, CSV, TSV, or Excel files and provides threshold-based filtering and clustering of variants using AGVD's GraphQL API.


🚀 Features

  • Supports VCF, CSV, TSV, and Excel input formats
  • Accepts both rsID and CHR_POS_REF_ALT variant formats
  • Submits queries in batches for improved performance
  • Optional local caching for repeated queries
  • Dry-run mode for validation without querying
  • Exports enriched results and JSON summary
  • Multithreaded for faster processing
  • Supports "peek" query mode for quick variant lookups

📦 Requirements

  • Python 3.7+
  • Dependencies (installed via pip install -r requirements.txt):
pandas
tqdm
pysam
requests
openpyxl

🔧 Usage

python agvd \
  --KEY YOUR_AGVD_API_KEY \
  --INFILE path/to/input.vcf \
  --OUTPUT path/to/output.csv \
  --THRESHOLD 0.01

Optional Arguments:

Argument Description
--BATCH Batch size for API queries (default: 1000)
--COLUMN Column name with variant IDs (CSV/TSV/Excel only)
--CHR Chromosome column name
--POS Position column name
--REF Reference allele column name
--ALT Alternate allele column name
--dry-run Validates the file without submitting queries
--verbose Enables debug-level logging
--cache Enables local query caching
--threads Number of threads to use for parallel processing
--peek Provide a list of variant IDs (or input file) to run a quick lookup without thresholding

📂 Input Format Examples

VCF

Standard .vcf file with #CHROM, POS, REF, and ALT fields.

CSV/TSV/Excel

Either:

  • Single column with rsID or CHR_POS_REF_ALT format
  • Separate columns for --CHR, --POS, --REF, --ALT

🧪 Output

  • A file containing original input +:
    • AGVDCUTOFF: status based on MAF threshold
    • African_MAF: MAF value
    • <Cluster>_MAF: MAF per population cluster
  • A _summary.json with success/failure statistics

🔍 Peek Mode

The peek mode lets you quickly retrieve availability and access URLs for variants without threshold-based filtering.

From file:

python agvd --peek --INFILE variants.txt

From inline list:

python agvd --peek rs123 rs456 chr1:12345:A:G

Returns:

[
  {
    "id": "rs123",
    "status": "available",
    "url": "https://agvd.afrigen-d.org/variant?id=rs123"
  },
  {
    "id": "1-12345-A-G",
    "status": "unavailable",
    "url": null
  }
]

You can also call this in Python:

from agvd.query import peek_variants
results = peek_variants(["rs123", "chr1:12345:A:G"])

🛠 Development

To test locally:

python agvd \
  -k test_key \
  -i examples/test.csv \
  -o out.csv \
  -t 0.05 \
  --verbose

To profile performance:

python -m cProfile agvd ...

🧾 License

MIT License © 2025 AGVD Team


📬 Contact

For support or questions, please contact: agvd@afrigen-d.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agvd-0.1.2.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agvd-0.1.2-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file agvd-0.1.2.tar.gz.

File metadata

  • Download URL: agvd-0.1.2.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for agvd-0.1.2.tar.gz
Algorithm Hash digest
SHA256 09234e58270f0a6201c1b2bcd7dbb41df90ea5c35dc6fc978f4f862847d7315e
MD5 ab3b381fb6d71be610a8d1c46182e9d6
BLAKE2b-256 6f8af04552fc7334c0aa3efa8e2967acad4250efecc5b93e8b00bc7d3a8c99b3

See more details on using hashes here.

File details

Details for the file agvd-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: agvd-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for agvd-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ed5735a788f924f84d8038ee6d38877d1066910df300ed30490bed64247bba14
MD5 848221a022871b5ffea7f249dee35b20
BLAKE2b-256 f7b6be5da2e998a94db3364900168c54f0a888ce264c663af0cdb4e85a6ade34

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page