Skip to main content

Python interface to ensembl reference genome metadata

Project description

pyensembl

Python interface to Ensembl reference genome metadata (exons, transcripts, &c)

from pyensembl import EnsemblRelease

# release 77 uses human reference genome GRCh38
data = EnsemblRelease(77)

# will return ['HLA-A']
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

# get all exons associated with HLA-A
exon_ids  = data.exon_ids_of_gene_name('HLA-A')

API

The EnsemblRelease object has methods to let you access all possible combinations of the annotation features gene_name, gene_id, transcript_name, transcript_id, exon_id as well as the location of these genomic elements (contig, start position, end position, strand).

Gene Names

gene_names() : returns all gene names in the annotation database

gene_names_on_contig(contig) : all gene names on a particular chromosome/contig

gene_names_at_locus(contig, position, end=None, strand=None) : names of genes overlapping with the given locus (returns a list to account for overlapping genes)

gene_name_of_gene_id(gene_id) : name of gene with given ID

gene_name_of_transcript_id(transcript_id) : name of gene associated with given transcript ID

gene_name_of_transcript_name(transcript_name) : name of gene associated with given transcript name

gene_name_of_exon_id(exon_id) : name of gene associated with given exon ID

Gene IDs

gene_ids(contig=None, strand=None) : all gene IDs in the annotation database

gene_id_of_gene_name(gene_name) : translate Ensembl gene ID to its corresponding name

Transcript Names

transcript_names(contig=None, strand=None) : all transcript names in the annotation database

Transcript IDs

transcript_ids(contig=None, strand=None) : returns all transcript IDs in the annotation database

transcript_ids_of_gene_id(gene_id) : return IDs of all transcripts associated with given gene ID

transcript_ids_of_gene_name(gene_name) : return IDs of all transcripts associated with given gene name

transcript_id_of_transcript_name(transcript_name) : translate transcript name to its ID

transcript_ids_of_exon_id(exon_id) : return IDs of all transcripts associatd with given exon ID

Exon IDs

exon_ids(contig=None, strand=None) : returns all transcript IDs in the annotation database

exon_ids_of_gene_id(gene_id)

exon_ids_of_gene_name(gene_name)

exon_ids_of_transcript_name(transcript_name)

exon_ids_of_transcript_id(transcript_id)

Locations

These functions currently assume that each gene maps to a single unique location, which is invalid both with heavily copied genes (e.g. U1) and with polymorphic regions (e.g. HLA genes).

location_of_gene_name(gene_name)

location_of_gene_id(gene_id)

location_of_transcript_id(transcript_id)

location_of_exon_id(exon_id)

Start Codons

start_codon_of_transcript_id

start_codon_of_transcript_name

Stop Codons

stop_codon_of_transcript_id

stop_codon_of_transcript_name

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyensembl-0.5.2.tar.gz (41.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page