Genomic allele frequency query engine with bitmap-encoded genotypes

These details have not been verified by PyPI

Project description

AFQuery

AFQuery enables fast allele frequency queries on user-defined subsets of local genomic cohorts, without rescanning VCFs.

AFQuery is a bitmap-indexed engine that efficiently recomputes AC/AN/AF for dynamically defined subcohorts (e.g., by phenotype, sex, or sequencing technology), a common requirement in ACMG/AMP variant classification. It stores per-variant genotype data as Roaring Bitmaps in Parquet files and resolves sample filters into bitmaps that can be intersected in microseconds, enabling sub-100 ms queries on large cohorts. The system accounts for ploidy in sex chromosomes, adjusts AN based on sequencing technology, supports incremental updates, and runs locally using a file-based setup (Parquet + SQLite) without requiring server or cloud infrastructure.

Full Documentation→

When to use AFQuery

You need allele frequencies for phenotype or user-defined subcohorts
You work with mixed sequencing technologies or capture kits versions (WGS, WES, targeted panels)
You require fast, repeated queries without rescanning VCFs
You want a local, reproducible workflow without cloud or cluster dependencies

Features

Dynamic subcohort queries (<100 ms) — bitmap intersections at query time; no VCF re-scan required
Technology-aware — avoids bias when mixing WGS, WES, and panels using different BED capture indexes
Ploidy-aware — correct handling of sex chromosomes (PAR/non-PAR, chrX, chrY)
ACMG-compatible allele counting — AC/AN/AF computed per standard definitions
Flexible metadata filtering — arbitrary labels (ICD-10, HPO, custom fields) with inclusion/exclusion rules
Incremental updates — add or remove samples and update metadata without rebuilding the database
VCF annotation — annotate variants using subcohort-specific frequencies
FILTER/call quality tracking — failed calls (FILTER!=PASS) tracked per variant and reported as N_FAIL
Batch and region queries — query a single locus, a genomic region, or a list of variants from a file
Bulk CSV export — export all variant frequencies with optional disaggregation by sex, technology, or phenotype
Audit changelog — all database operations logged with timestamps and operator notes
Database validation — integrity checks with scripted exit codes
Portable and serverless — file-based system, no infrastructure required

Performance

Query latency: <100 ms (tested up to 50,000 samples)
Storage: ~2 bytes/sample/variant
Scales to millions of variants per chromosome

Comparison with Alternative Tools

	AFQuery	bcftools	GATK GenomicsDB	Hail
Technology-aware AN	Yes	No	No	No
Metadata filtering	Arbitrary labels	No	No	Custom code
Ploidy-aware sex chromosomes	Yes	Manual	No	Manual
Dynamic subcohort queries	Yes	No	Limited	Requires code
FILTER/call quality tracking	Per variant	Manual	No	Manual
Incremental updates	Yes	No	Yes	No
Infrastructure required	None	None	Java/server	Spark cluster
Query latency (50K samples)	<100 ms	~5 min	<1 min	1–2 min

Algorithm Overview

AFQuery pre-indexes per-variant genotype data as Roaring Bitmaps stored in Parquet files. Each variant row holds three bitmaps: heterozygous carriers, homozygous alt carriers, and samples with FILTER!=PASS. Sample metadata (sex, phenotype, technology) is pre-serialized as bitmaps in SQLite.

At query time, the requested sample filter is resolved to a single candidate bitmap via bitmap intersections and differences — taking microseconds regardless of cohort size. For each variant, the candidate bitmap is intersected with the genotype bitmaps to compute AC/AN/AF. AN accounts for WES capture regions (via BED-indexed interval trees) and for ploidy on sex chromosomes (males are haploid on non-PAR chrX and chrY).

Input Requirements

VCF files: normalized and consistent with the selected genome build (GRCh37 or GRCh38)
Sample metadata: must include sex, sequencing technology, and any fields used for filtering (e.g., phenotype)
BED files (optional): define capture regions for each sequencing technology

Quick Start

Example workflow from raw VCFs to query, export, and annotation:

pip install afquery
# Docker: see Installation docs for docker pull / run usage

# Build the database
afquery create-db --manifest samples.tsv --output-dir ./db/ --genome-build GRCh38

# Inspect the database
afquery info --db ./db/

# Query a single position, filtered to a phenotype
afquery query --db ./db/ --locus chr1:925952 --phenotype E11.9 --sex female

# Query a genomic region
afquery query --db ./db/ --region chr1:900000-1000000

# Export BRCA1 variant frequencies to CSV
afquery dump --db ./db/ --output all_variants.csv --chrom chr17 --start 43044292 --end 43170327

# Annotate a VCF with cohort frequencies
afquery annotate --db ./db/ --input patient.vcf --output annotated.vcf --threads 12

# Add new samples to an existing database
afquery update-db --db ./db/ --add-samples new_samples.tsv

Documentation

Citation

If you use AFQuery, please cite:

AFQuery: fast, metadata-aware allele frequency queries on local genomic cohorts.
(manuscript in preparation)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.2

Mar 24, 2026

0.2.1

Mar 23, 2026

This version

0.2.0

Mar 23, 2026

0.1.4

Mar 16, 2026

0.1.3

Mar 16, 2026

0.1.2.2

Mar 18, 2026

0.1.2.1

Mar 16, 2026

0.1.2

Mar 16, 2026

0.1.1

Mar 16, 2026

0.1.0

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afquery-0.2.0.tar.gz (162.3 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

afquery-0.2.0-py3-none-any.whl (63.7 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file afquery-0.2.0.tar.gz.

File metadata

Download URL: afquery-0.2.0.tar.gz
Upload date: Mar 23, 2026
Size: 162.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for afquery-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`5ec1b950d008548997ba5fa22df28bbbfdd132a669ab0dc314536a74fd0392e2`
MD5	`8e53866ad6656900fdba5519efc3a17f`
BLAKE2b-256	`a1de389ba88f1db5c814ef1f20f09ba8f52c0694b5c78b774bd81ab9b56f63c6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for afquery-0.2.0.tar.gz:

Publisher: release.yml on dlopez-bioinfo/afquery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: afquery-0.2.0.tar.gz
- Subject digest: 5ec1b950d008548997ba5fa22df28bbbfdd132a669ab0dc314536a74fd0392e2
- Sigstore transparency entry: 1159601483
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: dlopez-bioinfo/afquery@3b270eda68e3a42e41ee7f4bf0326ae2a6a0b807
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/dlopez-bioinfo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3b270eda68e3a42e41ee7f4bf0326ae2a6a0b807
- Trigger Event: push

File details

Details for the file afquery-0.2.0-py3-none-any.whl.

File metadata

Download URL: afquery-0.2.0-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 63.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for afquery-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3b2edb20cf67662ecbe585ffbd359e22ea0adeb477c4912b7a7da7da2afc96ec`
MD5	`82a97c0be351adc874a34c20cb523bc0`
BLAKE2b-256	`03fdcd1b431b3fddc222b662b3374506f4995879ee4e8e0c44468efe9cb1f6ee`

See more details on using hashes here.

Provenance

The following attestation bundles were made for afquery-0.2.0-py3-none-any.whl:

Publisher: release.yml on dlopez-bioinfo/afquery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: afquery-0.2.0-py3-none-any.whl
- Subject digest: 3b2edb20cf67662ecbe585ffbd359e22ea0adeb477c4912b7a7da7da2afc96ec
- Sigstore transparency entry: 1159601553
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: dlopez-bioinfo/afquery@3b270eda68e3a42e41ee7f4bf0326ae2a6a0b807
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/dlopez-bioinfo
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3b270eda68e3a42e41ee7f4bf0326ae2a6a0b807
- Trigger Event: push

afquery 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

AFQuery

When to use AFQuery

Features

Performance

Comparison with Alternative Tools

Algorithm Overview

Input Requirements

Quick Start

Documentation

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance