Python interface to ensembl reference genome metadata
Project description
pyensembl
Python interface to Ensembl reference genome metadata (exons, transcripts, &c)
from pyensembl import EnsemblRelease
# release 77 uses human reference genome GRCh38
data = EnsemblRelease(77)
# will return ['HLA-A']
gene_names = data.gene_names_at_locus(contig=6, position=29945884)
# get all exons associated with HLA-A
exon_ids = data.exon_ids_of_gene_name('HLA-A')
API
The EnsemblRelease object has methods to let you access all possible combinations of the annotation features gene_name, gene_id, transcript_name, transcript_id, exon_id as well as the location of these genomic elements (contig, start position, end position, strand).
Gene Names
gene_names() : returns all gene names in the annotation database
gene_names_on_contig(contig) : all gene names on a particular chromosome/contig
gene_names_at_locus(contig, position) : names of genes overlapping with the given locus (returns a list to account for overlapping genes)
gene_names_at_loci(contig, start, end) : names of genes overlapping with the given range of loci
gene_name_of_gene_id(gene_id) : name of gene with given ID
gene_name_of_transcript_id(transcript_id) : name of gene associated with given transcript ID
gene_name_of_transcript_name(transcript_name) : name of gene associated with given transcript name
gene_name_of_exon_id(exon_id) : name of gene associated with given exon ID
Gene IDs
gene_ids() : all gene IDs in the annotation database
gene_ids_on_contig(contig) : all gene IDs on a particular chromosome/contig
gene_id_of_gene_name(gene_name) : translate Ensembl gene ID to its corresponding name
Transcript Names
transcript_names() : all transcript names in the annotation database
transcript_names_on_contig(contig) : all transcript names on a particular chromosome/contig
Transcript IDs
transcript_ids() : returns all transcript IDs in the annotation database
transcript_ids_of_gene_id(gene_id) : return IDs of all transcripts associated with given gene ID
transcript_ids_of_gene_name(gene_name) : return IDs of all transcripts associated with given gene name
transcript_id_of_transcript_name(transcript_name) : translate transcript name to its ID
transcript_ids_of_exon_id(exon_id) : return IDs of all transcripts associatd with given exon ID
Exon IDs
exon_ids() : returns all transcript IDs in the annotation database
exon_ids_of_gene_id
exon_ids_of_gene_name
exon_ids_of_transcript_name
exon_ids_of_transcript_id
Locations
These functions currently assume that each gene maps to a single unique location, which is invalid both with heavily copied genes (e.g. U1) and with polymorphic regions (e.g. HLA genes).
location_of_gene_name(gene_name)
location_of_gene_id(gene_id)
location_of_transcript_id(transcript_id)
location_of_exon_id(exon_id)
Start Codons
start_codon_of_transcript_id
start_codon_of_transcript_name
Stop Codons
stop_codon_of_transcript_id
stop_codon_of_transcript_name
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pyensembl-0.5.0.tar.gz
.
File metadata
- Download URL: pyensembl-0.5.0.tar.gz
- Upload date:
- Size: 33.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c90baab34aae8287a21772bfb0f99323ab0b7e33edf58cc8fbffb0dc9ce275a |
|
MD5 | f16bd61e026ba052b8ada7fcd4fbf0fa |
|
BLAKE2b-256 | 9922a372832809b4444d20462d18e638637cf2d8c92709b7bbe20aeb47b711df |