Skip to main content

Python interface to ensembl reference genome metadata

Project description

pyensembl

Python interface to Ensembl reference genome metadata (exons, transcripts, &c)

from pyensembl import EnsemblRelease

# release 77 uses human reference genome GRCh38
data = EnsemblRelease(77)

# will return ['HLA-A']
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

# get all exons associated with HLA-A
exon_ids  = data.exon_ids_of_gene_name('HLA-A')

API

The EnsemblRelease object has methods to let you access all possible combinations of the annotation features gene_name, gene_id, transcript_name, transcript_id, exon_id as well as the location of these genomic elements (contig, start position, end position, strand).

Gene Names

gene_names() : returns all gene names in the annotation database

gene_names_on_contig(contig) : all gene names on a particular chromosome/contig

gene_names_at_locus(contig, position) : names of genes overlapping with the given locus (returns a list to account for overlapping genes)

gene_names_at_loci(contig, start, end) : names of genes overlapping with the given range of loci

gene_name_of_gene_id(gene_id) : name of gene with given ID

gene_name_of_transcript_id(transcript_id) : name of gene associated with given transcript ID

gene_name_of_transcript_name(transcript_name) : name of gene associated with given transcript name

gene_name_of_exon_id(exon_id) : name of gene associated with given exon ID

Gene IDs

gene_ids() : all gene IDs in the annotation database

gene_ids_on_contig(contig) : all gene IDs on a particular chromosome/contig

gene_id_of_gene_name(gene_name) : translate Ensembl gene ID to its corresponding name

Transcript Names

transcript_names() : all transcript names in the annotation database

transcript_names_on_contig(contig) : all transcript names on a particular chromosome/contig

Transcript IDs

transcript_ids() : returns all transcript IDs in the annotation database

transcript_ids_of_gene_id(gene_id) : return IDs of all transcripts associated with given gene ID

transcript_ids_of_gene_name(gene_name) : return IDs of all transcripts associated with given gene name

transcript_id_of_transcript_name(transcript_name) : translate transcript name to its ID

transcript_ids_of_exon_id(exon_id) : return IDs of all transcripts associatd with given exon ID

Exon IDs

exon_ids() : returns all transcript IDs in the annotation database

exon_ids_of_gene_id

exon_ids_of_gene_name

exon_ids_of_transcript_name

exon_ids_of_transcript_id

Locations

These functions currently assume that each gene maps to a single unique location, which is invalid both with heavily copied genes (e.g. U1) and with polymorphic regions (e.g. HLA genes).

location_of_gene_name(gene_name)

location_of_gene_id(gene_id)

location_of_transcript_id(transcript_id)

location_of_exon_id(exon_id)

Start Codons

start_codon_of_transcript_id

start_codon_of_transcript_name

Stop Codons

stop_codon_of_transcript_id

stop_codon_of_transcript_name

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyensembl-0.5.0.tar.gz (33.8 kB view details)

Uploaded Source

File details

Details for the file pyensembl-0.5.0.tar.gz.

File metadata

  • Download URL: pyensembl-0.5.0.tar.gz
  • Upload date:
  • Size: 33.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pyensembl-0.5.0.tar.gz
Algorithm Hash digest
SHA256 2c90baab34aae8287a21772bfb0f99323ab0b7e33edf58cc8fbffb0dc9ce275a
MD5 f16bd61e026ba052b8ada7fcd4fbf0fa
BLAKE2b-256 9922a372832809b4444d20462d18e638637cf2d8c92709b7bbe20aeb47b711df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page