Skip to main content

Sequence Annotation

Project description

https://img.shields.io/travis/nmdp-bioinformatics/SeqAnn.svg Documentation Status Updates https://img.shields.io/pypi/v/seqann.svg https://coveralls.io/repos/github/nmdp-bioinformatics/SeqAnn/badge.svg?branch=master

Python package for annotating gene features

Overview

The seqann package allows users to annotate gene features in consensus sequences. Annotations can be created by passing consensus sequences to the annotate method in the BioSeqAnn class. No parameters are required when initalizing a BioSeqAnn class. However, annotations can be created significantly faster when using a BioSQL database. When a BioSQL database is not provided the lastest hla.dat file is downloaded and parsed. A BioSQL database containing all of IPD-IMGT/HLA is available on DockerHub and can be run on any machine that has docker installed.

Install

pip install seq-ann

Parameters

Below are the list of parameters and the default values used when initalizing a BioSeqAnn object.

Parameter

Type

Default

Description

server

BioSeqDatabase

None

A BioSQL database containing all of the sequence data from IPD-IMGT/HLA.

dbversion

str

Latest

The IPD-IMGT/HLA or KIR database release.

datfile

str

None

The IPD-IMGT/HLA or KIR dat file to use in place of the server parameter.

kir

bool

False

Flag for indicating the input sequences are from the KIR gene system.

align

bool

False

Flag for producing the alignments along with the annotations.

verbose

bool

False

Flag for running in verbose mode.

verbosity

int

None

Numerical value to indicate how verbose the output will be in verbose mode.

debug

Dict

None

A dictionary containing a process names as the key and verbosity as the value

Usage

To annotated a sequence initialize a new BioSeqAnn object and then pass the sequence to the annotate method. The sequence must be a Biopython Seq. The locus of the sequence is not required but it will improve the accuracy of the annotation.

The packages ncbi-blast+ and clustalo are required to be installed on your system.

Set variables to BioSQL host/port if using BioSQL.

export BIOSQLHOST="localhost"
export BIOSQLPORT=3306
from seqann import BioSeqAnn
seqann = BioSeqAnn()
ann = seqann.annotate(sequence, "HLA-A")

The annotation of sequence can be done with or without providing a BioSeqDatabase. To use a BioSQL database initialize a BioSeqDatabase with the parameters that match the database you have running. If you are running the imgt_biosqldb from DockerHub then the following parameters we be the same.

from seqann import BioSeqAnn
from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="pymysql", user="root",
                                      passwd="my-secret-pw", host="localhost",
                                      db="bioseqdb", port=3306)
seqann = BioSeqAnn(server=server)
ann = seqann.annotate(sequence, "HLA-A")

You may need to set environment variables: BIOSQLHOST (e.g. “localhost”) and BIOSQLPORT (e.g. 3306) to your docker instance.

Annotations

{
     'complete_annotation': True,
     'annotation': {'exon_1': SeqRecord(seq=Seq('AGAGACTCTCCCG', SingleLetterAlphabet()), id='HLA:HLA00630', name='HLA:HLA00630', description='HLA:HLA00630 DQB1*03:04:01 597 bp', dbxrefs=[]),
                    'exon_2': SeqRecord(seq=Seq('AGGATTTCGTGTACCAGTTTAAGGCCATGTGCTACTTCACCAACGGGACGGAGC...GAG', SingleLetterAlphabet()), id='HLA:HLA00630', name='HLA:HLA00630', description='HLA:HLA00630 DQB1*03:04:01 597 bp', dbxrefs=[]),
                    'exon_3': SeqRecord(seq=Seq('TGGAGCCCACAGTGACCATCTCCCCATCCAGGACAGAGGCCCTCAACCACCACA...ATG', SingleLetterAlphabet()), id='HLA:HLA00630', name='<unknown name>', description='HLA:HLA00630', dbxrefs=[])},
     'features': {'exon_1': SeqFeature(FeatureLocation(ExactPosition(0), ExactPosition(13), strand=1), type='exon_1'),
                  'exon_2': SeqFeature(FeatureLocation(ExactPosition(13), ExactPosition(283), strand=1), type='exon_2')
                  'exon_3': SeqFeature(FeatureLocation(ExactPosition(283), ExactPosition(503), strand=1), type='exon_3')},
     'method': 'nt_search and clustalo',
     'gfe': 'HLA-Aw2-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-4',
     'seq': SeqRecord(seq=Seq('AGAGACTCTCCCGAGGATTTCGTGTACCAGTTTAAGGCCATGTGCTACTTCACC...ATG', SingleLetterAlphabet()), id='HLA:HLA00630', name='HLA:HLA00630', description='HLA:HLA00630 DQB1*03:04:01 597 bp', dbxrefs=[])
}

Once a sequence has been annotated the gene features and their corresponding sequences are available in the returned Annotation object. If a full annotation is not able to be produced then nothing will be returned. Below is an example showing how the features can be accessed and printed out.

ann = seqann.annotate(sequence, "HLA-A")
for feat in ann.annotation:
    print(feat, ann.gfe, str(ann.annotation[feat].seq), sep="\t")

Dependencies

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seq-ann-1.1.0.tar.gz (21.1 MB view details)

Uploaded Source

Built Distribution

seq_ann-1.1.0-py2.py3-none-any.whl (22.3 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file seq-ann-1.1.0.tar.gz.

File metadata

  • Download URL: seq-ann-1.1.0.tar.gz
  • Upload date:
  • Size: 21.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for seq-ann-1.1.0.tar.gz
Algorithm Hash digest
SHA256 7a5b1456f45a0a5deb5cc49fb7eab99290f03b642fda306c5a8653802b2e1115
MD5 39bda7de796aea1d049dc1bb16933b28
BLAKE2b-256 fcb2ab0ae754d370b77146553ca3006d046f60c31fae463d3cd6712ba89f5e40

See more details on using hashes here.

File details

Details for the file seq_ann-1.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: seq_ann-1.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 22.3 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for seq_ann-1.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 65ab7985dacd2bbe19c8e1d634c77d784c8ccfacffbfe2fcece1c9326da5ad1b
MD5 0387dd6c32eca04e28c094a94d7e154c
BLAKE2b-256 a1a11028e1c002abae9ea8d40e9c0be1077a717ed70c7ce9da574f7be570fd5e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page