Generic toolkit for processing DNA polymorphism data
Project description
AdaGenes
AdaGenes is a generic toolkit for processing, annotating, filtering and transforming DNA polymorphism data.
Main features:
- A powerful data object to store and edit DNA mutation data
- Functionality to read and write files in common genomics file formats, including VCF, MAF, CSV/TSV, XLSX and plain text files
- Effective variant filtering according to specific threshold or feature values
- Liftover genome positions between hg38/GRCh38, hg19/GRCh37 and T2T-CHM13 reference genomes
- Effective variant normalization in VCF and HGVS notation
Installation
pip install adagenes
Getting started
Read files
Start by reading in a data file in one of the supported file formats in a biomarker frame with
the read_file()
function. adagenes automatically identifies the file type and inititates the corresponding file reader.
You may also manually inititate a file reader and call its read_file()
function:
import adagenes as av
bframe = av.read_file("data/somaticMutations.vcf")
# Print biomarker identifiers
print(bframe.get_ids())
# Print loaded variant data completely
print(bframe.data)
If the variant data has been parsed correctly, the data of the biomarker frame should be a nested JSON dictionary:
{
'chr7:140753336A>T': {'variant_data': {'CHROM': '7', 'POS': '140753336', 'ID': '.', 'REF': 'A', 'ALT': 'T', 'QUAL': '100', ... },
'chr1:2556664C>.': {'variant_data': {'CHROM': '1', 'POS': '2556664', 'ID': '.', ... } }
}
Variant notations and normalization
Filter mutations
Liftover
Convert the genomic positions of variants between genome assemblies with the liftover function (GRCh37 / GRCh38 / T2T-CHM13):
import adagenes as ag
# Load a biomarker frame by defining the genome version (hg19/hg38/t2t)
infile = "somaticMutations.vcf"
bframe = ag.read_file(infile, genome_version="hg38")
# Liftover to another genome assemly
bframe_t2t = ag.liftover(bframe, target_genome="t2t")
# Write the new biomarker frame in T2T to a file
ag.write_file("somaticMutations.t2t.vcf", bframe_t2t)
Visualization
Annotate variants
You can easily annotate variant data by combining an AdaGenes biomarker frame with the Onkopus annotation framework:
pip install onkopus
Annotate the variant data of a biomarker frame by calling an Onkopus client directly on the bframe.data:
import adageness as av
import onkopus as op
genome_version="hg38"
bframe = av.read_file("somaticMutations.vcf", genome_version="hg38")
# Annotate with all Onkopus modules
bframe.data = op.annotate(bframe.data)
# Annotate with specific modules
bframe.data = op.AlphaMissenseClient(genome_version=genome_version).process_data(bframe.data)
bframe.data = op.GENCODEClient(genome_version=genome_version).process_data(bframe.data)
av.write_file("somaticMutations.annotated.avf",bframe)
Save data
Write a biomarker frame to a file with write_file()
in one of the supported file formats (.vcf,.maf,.csv):
import adagenes as av
av.write_file("/data/somaticMutations.annotated.maf", bframe, file_type="csv")
Dependencies
- scikit-learn
- pandas
- matplotlib
- plotly
- pyliftover
- blosum
- openpyxl
- requests
License
GPLv3
Documentation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.