Skip to main content

Generic toolkit for processing DNA polymorphism data

Project description

adagenes

AdaGenes

pipeline commits license coverage python_version release

AdaGenes is a generic toolkit for processing, annotating, filtering and transforming DNA polymorphism data.

Main features:

  • A powerful data object to store and edit DNA mutation data
  • Functionality to read and write files in common genomics file formats, including VCF, MAF, CSV/TSV, XLSX and plain text files
  • Effective variant filtering according to specific threshold or feature values
  • Liftover genome positions between hg38/GRCh38, hg19/GRCh37 and T2T-CHM13 reference genomes
  • Effective variant normalization in VCF and HGVS notation

Installation

AdaGenes is both usable as a Python package or directly from the command line. You can install AdaGenes in Python directly via PyPI:

pip install adagenes

Getting started

Reading files

Start by reading in a data file in one of the supported file formats in a biomarker frame with the read_file() function. adagenes automatically identifies the file type and inititates the corresponding file reader. You may also manually inititate a file reader and call its read_file() function:

import adagenes as ag

bframe = ag.read_file("data/somaticMutations.vcf")

# Print biomarker identifiers
print(bframe.get_ids())

# Print loaded variant data completely
print(bframe.data)

Instead of loading a variant file, you may also create a biomarker frame manually at genomic or protein level:

import adagenes as ag

# create biomarker frame based on variants at genomic level
bframe = ag.BiomarkerFrame(data=["chr7:g.140753336A>T"])

If the variant data has been parsed correctly, the data of the biomarker frame should be a nested JSON dictionary:

{
'chr7:140753336A>T': {'variant_data': {'CHROM': '7', 'POS': '140753336', 'ID': '.', 'REF': 'A', 'ALT': 'T', 'QUAL': '100', ... },
'chr1:2556664C>.': {'variant_data': {'CHROM': '1', 'POS': '2556664', 'ID': '.', ... } }
}

Liftover

Convert the genomic positions of variants between genome assemblies with the liftover function (GRCh37 / GRCh38 / T2T-CHM13):

For large variant files, you can use the AdaGenes process_file() function for stream-based processing:

import adagenes as ag

infile = "somaticMutations.vcf"
outfile = "somaticMutations.t2t.vcf"

client = ag.LiftoverClient(genome_version="hg19", target_genome="t2t")
ag.process_file(infile, outfile, client)

For small to medium sized variant files, you can load and edit the variant data as a biomarker frame:

import adagenes as ag

# Load a biomarker frame by defining the genome version (hg19/hg38/t2t)
infile = "somaticMutations.vcf"
bframe = ag.read_file(infile, genome_version="hg38")

# Liftover to another genome assemly
bframe_t2t = ag.liftover(bframe, target_genome="t2t")

# Write the new biomarker frame in T2T to a file
ag.write_file("somaticMutations.t2t.vcf", bframe_t2t)

Filter mutations

Annotate variants

Use Onkopus to annotate variants from the command line, e.g.

import adagenes as ag
import onkopus as op

bframe = ag.read_file("somaticMutations.vcf", genome_version="hg38")

bframe.data = op.PathogenicityClient(genome_version="hg38").process_data(bframe.data)

ag.write_file(bframe, "somaticMutations.annotated.vcf")

For further details on how to annotate variants, check out the Onkopus documentation.

Variant notations and normalization

Visualization

Annotate variants

You can easily annotate variant data by combining an AdaGenes biomarker frame with the Onkopus annotation framework:

pip install onkopus

Annotate the variant data of a biomarker frame by calling an Onkopus client directly on the bframe.data:

import adageness as av
import onkopus as op

genome_version="hg38"
bframe = av.read_file("somaticMutations.vcf", genome_version="hg38")

# Annotate with all Onkopus modules
bframe.data = op.annotate(bframe.data)

# Annotate with specific modules
bframe.data = op.AlphaMissenseClient(genome_version=genome_version).process_data(bframe.data)
bframe.data = op.GENCODEClient(genome_version=genome_version).process_data(bframe.data)

av.write_file("somaticMutations.annotated.avf",bframe)

Saving data

Write a biomarker frame to a file with write_file() in one of the supported file formats (.vcf,.maf,.csv):

import adagenes as ag

ag.write_file("/data/somaticMutations.annotated.maf", bframe, file_type="csv")

Dependencies

  • scikit-learn
  • pandas
  • matplotlib
  • plotly
  • pyliftover
  • blosum
  • openpyxl
  • requests

License

GPLv3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adagenes-0.3.0.tar.gz (12.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adagenes-0.3.0-py3-none-any.whl (453.2 kB view details)

Uploaded Python 3

File details

Details for the file adagenes-0.3.0.tar.gz.

File metadata

  • Download URL: adagenes-0.3.0.tar.gz
  • Upload date:
  • Size: 12.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for adagenes-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4527d9e9945e2a8c8b9bbe27acb1a8180e9ad39f993f1e4717b54d1e7bb7b03e
MD5 9acee0629caa8dec9affc2c2338ffe3e
BLAKE2b-256 f3d3c70c30d1b604561c8b86375343b8a690fdc9b6e1921b3ca40b8d3ec99be7

See more details on using hashes here.

File details

Details for the file adagenes-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: adagenes-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 453.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for adagenes-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd3a69988b5e5b2d9a20b6bb7572145fac1a27a25845f292b52ee41e0a2f3c3f
MD5 708310e9f74cc25a8e48f43475d01400
BLAKE2b-256 c5ae87e182c0550e1e363190b8100b2349cbf876ac5009dd0ae22ccf97caa613

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page