Generic toolkit for processing DNA polymorphism data

These details have not been verified by PyPI

Project links

Homepage

Project description

AdaGenes

AdaGenes is a generic toolkit for processing, annotating, filtering and transforming DNA polymorphism data.

Main features:

A powerful data object to store and edit DNA mutation data
Functionality to read and write files in common genomics file formats, including VCF, MAF, CSV/TSV, XLSX and plain text files
Effective variant filtering according to specific threshold or feature values
Liftover genome positions between hg38/GRCh38, hg19/GRCh37 and T2T-CHM13 reference genomes
Effective variant normalization in VCF and HGVS notation

Installation

pip install adagenes

Getting started

Read files

Start by reading in a data file in one of the supported file formats in a biomarker frame with the read_file() function. adagenes automatically identifies the file type and inititates the corresponding file reader. You may also manually inititate a file reader and call its read_file() function:

import adagenes as av

bframe = av.read_file("data/somaticMutations.vcf")

# Print biomarker identifiers
print(bframe.get_ids())

# Print loaded variant data completely
print(bframe.data)

If the variant data has been parsed correctly, the data of the biomarker frame should be a nested JSON dictionary:

{
'chr7:140753336A>T': {'variant_data': {'CHROM': '7', 'POS': '140753336', 'ID': '.', 'REF': 'A', 'ALT': 'T', 'QUAL': '100', ... },
'chr1:2556664C>.': {'variant_data': {'CHROM': '1', 'POS': '2556664', 'ID': '.', ... } }
}

Variant notations and normalization

Filter mutations

Liftover

Convert the genomic positions of variants between genome assemblies with the liftover function (GRCh37 / GRCh38 / T2T-CHM13):

import adagenes as ag

# Load a biomarker frame by defining the genome version (hg19/hg38/t2t)
infile = "somaticMutations.vcf"
bframe = ag.read_file(infile, genome_version="hg38")

# Liftover to another genome assemly
bframe_t2t = ag.liftover(bframe, target_genome="t2t")

# Write the new biomarker frame in T2T to a file
ag.write_file("somaticMutations.t2t.vcf", bframe_t2t)

Visualization

Annotate variants

You can easily annotate variant data by combining an AdaGenes biomarker frame with the Onkopus annotation framework:

pip install onkopus

Annotate the variant data of a biomarker frame by calling an Onkopus client directly on the bframe.data:

import adageness as av
import onkopus as op

genome_version="hg38"
bframe = av.read_file("somaticMutations.vcf", genome_version="hg38")

# Annotate with all Onkopus modules
bframe.data = op.annotate(bframe.data)

# Annotate with specific modules
bframe.data = op.AlphaMissenseClient(genome_version=genome_version).process_data(bframe.data)
bframe.data = op.GENCODEClient(genome_version=genome_version).process_data(bframe.data)

av.write_file("somaticMutations.annotated.avf",bframe)

Save data

Write a biomarker frame to a file with write_file() in one of the supported file formats (.vcf,.maf,.csv):

import adagenes as av

av.write_file("/data/somaticMutations.annotated.maf", bframe, file_type="csv")

Dependencies

scikit-learn
pandas
matplotlib
plotly
pyliftover
blosum
openpyxl
requests

License

GPLv3

Documentation

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.9

Oct 8, 2024

0.1.8

Oct 8, 2024

0.1.7

Oct 8, 2024

0.1.6

Oct 7, 2024

0.1.5

Oct 1, 2024

This version

0.1.4

Sep 26, 2024

0.1.2

Sep 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adagenes-0.1.4.tar.gz (12.5 MB view hashes)

Uploaded Sep 26, 2024 Source

Built Distribution

adagenes-0.1.4-py3-none-any.whl (410.2 kB view hashes)

Uploaded Sep 26, 2024 Python 3

Hashes for adagenes-0.1.4.tar.gz

Hashes for adagenes-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`831cad50ac296c195719288370e13003ff9454433b6c347df427805c1265cc49`
MD5	`33edb8f9744190a7c646dceb354a09da`
BLAKE2b-256	`88fa4be93226bfa14369ac0979f760ba604df6b538510f5b8918d246780081ba`

Hashes for adagenes-0.1.4-py3-none-any.whl

Hashes for adagenes-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6268ac74133c355b5dbe447cbc050220cf47b3aeb72b911edf95c9f77c6b123e`
MD5	`62d80d7986692982933d27c736b13f5d`
BLAKE2b-256	`bd9727b95fda16e0f0e1a0bb6a05f32185fe5e5545730d83accfdb3eb1d6939a`