Skip to main content

Python interface to ensembl reference genome metadata

Project description

pyensembl

Python interface to Ensembl reference genome metadata (exons, transcripts, &c)

from pyensembl import EnsemblRelease

# release 77 uses human reference genome GRCh38
data = EnsemblRelease(77)

# will return ['HLA-A']
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

# get all exons associated with HLA-A
exon_ids  = data.exon_ids_of_gene_name('HLA-A')

API

The EnsemblRelease object has methods to let you access all possible combinations of the annotation features gene_name, gene_id, transcript_name, transcript_id, exon_id as well as the location of these genomic elements (contig, start position, end position, strand).

Gene Names

gene_names() : returns all gene names in the annotation database

gene_names_on_contig(contig) : all gene names on a particular chromosome/contig

gene_names_at_locus(contig, position, end=None, strand=None) : names of genes overlapping with the given locus (returns a list to account for overlapping genes)

gene_name_of_gene_id(gene_id) : name of gene with given ID

gene_name_of_transcript_id(transcript_id) : name of gene associated with given transcript ID

gene_name_of_transcript_name(transcript_name) : name of gene associated with given transcript name

gene_name_of_exon_id(exon_id) : name of gene associated with given exon ID

Gene IDs

gene_ids(contig=None, strand=None) : all gene IDs in the annotation database

gene_id_of_gene_name(gene_name) : translate Ensembl gene ID to its corresponding name

Transcript Names

transcript_names(contig=None, strand=None) : all transcript names in the annotation database

Transcript IDs

transcript_ids(contig=None, strand=None) : returns all transcript IDs in the annotation database

transcript_ids_of_gene_id(gene_id) : return IDs of all transcripts associated with given gene ID

transcript_ids_of_gene_name(gene_name) : return IDs of all transcripts associated with given gene name

transcript_id_of_transcript_name(transcript_name) : translate transcript name to its ID

transcript_ids_of_exon_id(exon_id) : return IDs of all transcripts associatd with given exon ID

Exon IDs

exon_ids(contig=None, strand=None) : returns all transcript IDs in the annotation database

exon_ids_of_gene_id(gene_id)

exon_ids_of_gene_name(gene_name)

exon_ids_of_transcript_name(transcript_name)

exon_ids_of_transcript_id(transcript_id)

Locations

These functions currently assume that each gene maps to a single unique location, which is invalid both with heavily copied genes (e.g. U1) and with polymorphic regions (e.g. HLA genes).

location_of_gene_name(gene_name)

location_of_gene_id(gene_id)

location_of_transcript_id(transcript_id)

location_of_exon_id(exon_id)

Start Codons

start_codon_of_transcript_id

start_codon_of_transcript_name

Stop Codons

stop_codon_of_transcript_id

stop_codon_of_transcript_name

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyensembl-0.5.2.tar.gz (41.6 kB view details)

Uploaded Source

File details

Details for the file pyensembl-0.5.2.tar.gz.

File metadata

  • Download URL: pyensembl-0.5.2.tar.gz
  • Upload date:
  • Size: 41.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pyensembl-0.5.2.tar.gz
Algorithm Hash digest
SHA256 7e69859c752c6ddc26518ec9a5f9afe894de533c578effc7e69cb080a29ee2e5
MD5 0f8e00c509c3d812db81c82b7c9bb69c
BLAKE2b-256 a57cd9afc058a8a48f6a1c666482f4c688d39c402947f19203cd9d206ff196b4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page