Skip to main content

Generic toolkit for processing DNA polymorphism data

Project description

adagenes

AdaGenes

pipeline commits license coverage python_version release

AdaGenes is a generic toolkit for processing, annotating, filtering and transforming DNA polymorphism data.

Main features:

  • A powerful data object to store and edit DNA mutation data
  • Functionality to read and write files in common genomics file formats, including VCF, MAF, CSV/TSV, XLSX and plain text files
  • Effective variant filtering according to specific threshold or feature values
  • Liftover genome positions between hg38/GRCh38, hg19/GRCh37 and T2T-CHM13 reference genomes
  • Effective variant normalization in VCF and HGVS notation
  • VCF conversions: Generate VCF files from your own custom formatted CSV or Excel files

The AdaGenes server

AdaGenes is both usable as a web application or as a Python package.

To install the AdaGenes server, clone the repository. Then change into the directory, install the dependencies and start the Flask server:

git clone https://gitlab.gwdg.de/MedBioinf/mtb/adagenes.git
cd adagenes
pip install -r requirements.txt
python3 flask_app.py

Return to your main directory, then clone the AdaGenes web frontend (https://gitlab.gwdg.de/MedBioinf/mtb/adagenes-front-end):

cd ..
git clone https://gitlab.gwdg.de/MedBioinf/mtb/adagenes-front-end.git

To start the AdaGenes front end manually, change into the directory of the repository and start the Vue.js application:

cd adagenes-front-end
npm run dev

To start the server and the web front end using Docker, build and run the Docker containers:

cd adagenes
docker build -t adagenes-server -f Dockerfile 
docker run --name adagenes-server adagenes-server

cd ../adagenes-frontend
docker build -t adagenes-front-end -f Dockerfile .
docker run --name adagenes-front-end adagenes-front-end

For detailed installation instructions of the AdaGenes front end, please see the README (https://gitlab.gwdg.de/MedBioinf/mtb/adagenes-front-end/-/blob/main/README.md?ref_type=heads).

Configuration

By default, AdaGenes accesses the public modules of Onkopus and SeqCAT to annotate variants. To use AdaGenes entirely locally, install Onkopus and SeqCAT locally (https://gitlab.gwdg.de/MedBioinf/mtb/onkopus/onkopus, https://gitlab.gwdg.de/MedBioinf/mtb/seqcat).

Configure the AdaGenes server to use locally installed Onkopus annotation modules, define the Onkopus environment variables ONKOPUS_MODULE_PROTOCOL, ONKOPUS_MODULE_SERVER and ONKOPUS_PORTS_ACTIVE:

# Run AdaGenes using Docker
docker run --env ONKOPUS_MODULE_PROTOCOL=http --env ONKOPUS_MODULE_SERVER=localhost --env ONKOPUS_PORTS_ACTIVE=1 --name adagenes-server adagenes-server

# Run AdaGenes locally
export ONKOPUS_MODULE_PROTOCOL=http
export ONKOPUS_MODULE_SERVER=localhost
export ONKOPUS_PORTS_ACTIVE=1

To install Onkopus locally, see the Onkopus tutorial (https://mtb.bioinf.med.uni-goettingen.de/onkopus/docs).

Use the AdaGenes command line tool

Filter files

AdaGenes comes with a command line client that can annotate and filter variant data from the command line. To filter a variant file, use the filter option combined with the -f option and a filter expression:

adagenes filter -f "dbnsfp_REVEL_Score > 0.8" -i somaticMutations.vcf -o somaticMutations.filtered_revel.vcf

adagenes filter -f "CHROM in [0-9|X|Y]" -i somaticMutations.filtered_revel.vcf -o somaticMutations.filtered_chrom.vcf

adagenes filter -f "clinvar_CLNSIG contains 'Pathogenic'" -i somaticMutations.filtered_chrom.vcf -o somaticMutations.filtered_clinvar.vcf

The following operators can be used in the filter expression:

  • '=': Features that directly match a specific expression ('clinvar="Pathogenic""')
  • '>','<': Features with greater/lesser values (revel > 0.8)
  • 'in [REGEX]': Values matching a regular expression (CHROM in [0-9|X|Y])
  • 'Equals'
  • 'notEquals'
  • 'lessThan'
  • 'lessThanOrEqual'
  • 'greaterThan'
  • 'greaterThanOrEqual'
  • 'contains'
  • 'notContains'
  • 'beginsWith'
  • 'endsWith'

Use AdaGenes in Python

If using AdaGenes as a python package, you can install AdaGenes in Python directly via PyPI:

pip install adagenes

Reading files

Start by reading in a data file in one of the supported file formats in a biomarker frame with the read_file() function. adagenes automatically identifies the file type and initiates the corresponding file reader. You may also manually initiate a file reader and call its read_file() function:

import adagenes as ag

bframe = ag.read_file("data/somaticMutations.vcf")

# Print biomarker identifiers
print(bframe.get_ids())

# Print loaded variant data completely
print(bframe.data)

Instead of loading a variant file, you may also create a biomarker frame manually at genomic or protein level:

import adagenes as ag

# create biomarker frame based on variants at genomic level
bframe = ag.BiomarkerFrame(data=["chr7:g.140753336A>T"])

If the variant data has been parsed correctly, the data of the biomarker frame should be a nested JSON dictionary:

{
'chr7:140753336A>T': {'variant_data': {'CHROM': '7', 'POS': '140753336', 'ID': '.', 'REF': 'A', 'ALT': 'T', 'QUAL': '100', ... },
'chr1:2556664C>.': {'variant_data': {'CHROM': '1', 'POS': '2556664', 'ID': '.', ... } }
}

Liftover

Convert the genomic positions of variants between genome assemblies with the liftover function (GRCh37 / GRCh38 / T2T-CHM13):

For large variant files, you can use the AdaGenes process_file() function for stream-based processing:

import adagenes as ag

infile = "somaticMutations.vcf"
outfile = "somaticMutations.t2t.vcf"

client = ag.LiftoverClient(genome_version="hg19", target_genome="t2t")
ag.process_file(infile, outfile, client)

For small to medium sized variant files, you can load and edit the variant data as a biomarker frame:

import adagenes as ag

# Load a biomarker frame by defining the genome version (hg19/hg38/t2t)
infile = "somaticMutations.vcf"
bframe = ag.read_file(infile, genome_version="hg38")

# Liftover to another genome assemly
bframe_t2t = ag.liftover(bframe, target_genome="t2t")

# Write the new biomarker frame in T2T to a file
ag.write_file("somaticMutations.t2t.vcf", bframe_t2t)

Annotate variants

Use Onkopus to annotate variants from the command line, e.g.

import adagenes as ag
import onkopus as op

bframe = ag.read_file("somaticMutations.vcf", genome_version="hg38")

bframe.data = op.PathogenicityClient(genome_version="hg38").process_data(bframe.data)

ag.write_file(bframe, "somaticMutations.annotated.vcf")

For further details on how to annotate variants, check out the Onkopus documentation.

Annotate variants

You can easily annotate variant data by combining an AdaGenes biomarker frame with the Onkopus annotation framework:

pip install onkopus

Annotate the variant data of a biomarker frame by calling an Onkopus client directly on the bframe.data:

import adagenes as ag
import onkopus as op

genome_version="hg38"
bframe = ag.read_file("somaticMutations.vcf", genome_version="hg38")

# Annotate with all Onkopus modules
bframe.data = op.annotate(bframe.data)

# Annotate with specific modules
bframe.data = op.AlphaMissenseClient(genome_version=genome_version).process_data(bframe.data)
bframe.data = op.GENCODEClient(genome_version=genome_version).process_data(bframe.data)

ag.write_file("somaticMutations.annotated.avf",bframe)

Saving data

Write a biomarker frame to a file with write_file() in one of the supported file formats (.vcf,.maf,.csv):

import adagenes as ag

ag.write_file("/data/somaticMutations.annotated.maf", bframe, file_type="csv")

Documentation

A detailed documentation on how to use the AdaGenes web application, the CLI and the python package can be found on the public AdaGenes website.

Dependencies

  • scikit-learn
  • pandas
  • matplotlib
  • plotly
  • pyliftover
  • blosum
  • openpyxl
  • requests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adagenes-0.5.3.tar.gz (12.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adagenes-0.5.3-py3-none-any.whl (12.7 MB view details)

Uploaded Python 3

File details

Details for the file adagenes-0.5.3.tar.gz.

File metadata

  • Download URL: adagenes-0.5.3.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for adagenes-0.5.3.tar.gz
Algorithm Hash digest
SHA256 c50c7c9090b60a3dce100169da6f70c5704ec43b7bd9f01c4b3a3d68492326bc
MD5 8fb4aa66a173e8939418b351d26b2c82
BLAKE2b-256 670d98778e76b938fc32fac14975872bf84ea60127f603860750db61baaca044

See more details on using hashes here.

File details

Details for the file adagenes-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: adagenes-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 12.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for adagenes-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8408b62216b47676df7acfe1b2466e53829d180d7d3daf345e048f34afba4c52
MD5 4fb059605a6e2344031089722fe6a530
BLAKE2b-256 4f2e1ea26742fb6d293c46b2d4be551b19f30d43fc90a2fb9c4ef66dc8551c13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page