Skip to main content

Call ISBT alleles from VCF/s

Project description

RBCeq2 reads in genomic variant data in the form of variant call files (VCF) and outputs blood group (BG) genotypes and phenotypes. The user docs (Word) explains how RBCeq2 constructs possible allele combinations and then filters them until only possible genotype/phenotype combinations remain. Internal to the tool, all variants are based on human genome release GRCh37 or GRCh38, and in the context of a variant ‘_ref’ means no change to the reference nucleotide for the associated position.

Overview

At the highest level RBCeq2 finds all possible alleles, then filters out those that fail certain logic checks. This allows for an auditable trail of why it has reached a certain result. Every effort has been made to be explicit both in encoding alleles in our database and while writing code. This results in verbose but unambiguous results. Some tools in this space employ Baysian or other likihood methods to filter results – we chose not to do that initially but might add it as an optional feature in the future. Last, some liberties have been taken to standardise syntax and nomenclature across blood groups.

This initial release of RBCeq2 is focused on perfecting the calling of International Society for Blood Transfusion (ISBT) defined BG alleles from simple variants (i.e single nucleotide variant -SNVs/small INDEL) that can be found in standard short read or microarray derived VCFs. Further, it supports the use of long read VCFs (i.e large indels and phased data). However, these features are not as polished.

Bugs

This software is extensively tested and accurately reports genotypes/phenotypes based on our inhouse definitions of the ‘correct’ answer, however, there are some examples where the ‘correct’ answer is subjective. These docs are detailed – if you find what you think is a bug in the results from RBCeq2 please take the time to understand if it inline with what we intended or not. We will endeavor to fix any black and white bugs in less than one week. Most of these will be rare variants that are encoded wrong in our database. Further, we value any and all feedback and feature requests.

How To

Install via pip (python3.12+) or clone the git repository:

pip install RBCeq2

Show help:

rbceq2 -h

Usage:

usage: py main.py --vcf example_multi_sample.vcf.gz --out example --reference_genome GRCh37

options:
  -h, show this help message and exit
  --vcf       Path to vcf file/s
  --out       Prefix for output files (default: None)
  --allele_depth   Allele depth (default: 10)
  --genotype_quality   Genotype quality (default: 10)
  --processes       Number of processes (default: 1)
                    #More is faster, so long as you have that many CPUs and matched RAM (1:1) available
  --reference_genome GRCh37/8 (default: None)
                    #If your result are all reference/wildtype then you probably got this wrong
  --phased    Use phase information (default: False)
  --microarray   Input is from a microarray. (default: False)
  --debug     Enable debug logging. If not set, logging will be at info level. (default: False)
                    #Turn on to generate output like what is used in the examples in the user documentation
  --validate  Enable VCF validation. Doubles run time (default: False)
                    #Not normally needed
  --PDFs Make single sample PDF reports

Output

RBCeq2 generates a log.txt, three programmatically passable TSVs, and a folder with one PDF per sample. The TSVs (one for the genotype and two for phenotype [numeric and alphanumeric separately]) have BGs as column names and sample names (from VCF) as the rows.

1000 Genomes Examples

Genotypes

A4GALT

ABCB6

ABCC1

GM18501

A4GALT*01/A4GALT*02, A4GALT*01/A4GALT*02.02

ABCB6*01/ABCB6*01W.02

ABCC1*01/ABCC1*01

GM18519

A4GALT*01.02/A4GALT*01.02 | ABCB6*01/ABCB6*01W.02 | ABCC1*01/ABCC1*01

GM18856

A4GALT*01/A4GALT*01.02 | ABCB6*01/ABCB6*01 | ABCC1*01/ABCC1*01

Note: In the genotypes TSV, allele pairs are separated by / and if multiple pairs are possible these are separated by ,.

Phenotype Alphanumeric

A4GALT

ABCB6

ABCC1

GM18501

P1+,Pk+,(P2+)

Lan+

WLF+

GM18519

P1+,Pk+

Lan+

WLF+

GM18856

P1+,Pk+

Lan+

WLF+

Phenotype Numeric

A4GALT

ABCB6

ABCC1

GM18501

ABCC1:1

GM18519

ABCC1:1

GM18856

ABCC1:1

In the phenotype TSVs, antigens are separated by , so if multiple phenotypes are possible they’re separated by `` | `` (e.g., DO:1,2 | DO:1,2,5). Note, not all BGs have both a numeric and alphanumeric phenotype.

Further details

Please see the RBCeq2 user documentation Word doc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rbceq2-2.1.0.tar.gz (224.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rbceq2-2.1.0-py3-none-any.whl (162.4 kB view details)

Uploaded Python 3

File details

Details for the file rbceq2-2.1.0.tar.gz.

File metadata

  • Download URL: rbceq2-2.1.0.tar.gz
  • Upload date:
  • Size: 224.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for rbceq2-2.1.0.tar.gz
Algorithm Hash digest
SHA256 a790abb9514f3897a4b5dfc8601ff00676603188a9e273deac16174e36708e11
MD5 786837487b430f20d16784db34bb9a31
BLAKE2b-256 a527ddf3341923da7e2c4b7d35d85b9b6e134d04ab68fe0fe32e45d77ca4a4ef

See more details on using hashes here.

File details

Details for the file rbceq2-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: rbceq2-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 162.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for rbceq2-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd1ad9c1d6076d22270eba03c8698c83b7750269c05bbe2c3ac48fb96d256178
MD5 189b8d6440263ac2011b2c08f9709560
BLAKE2b-256 9b98c655b148f1d10bb4a2bcb8e36f097e8174c4145938c2af218dd97b709648

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page