Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

An easy way to access EnsEMBL data with Python.

Project Description

pyEnsemblRest is a simple Python wrapper around the EnsEMBL REST API


pyEnsemblRest - A wrapper for the EnsEMBL REST API

Copyright (C) 2013-2016, Steve Moss

pyEnsemblRest is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

pyEnsemblRest is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with pyEnsemblRest. If not, see <>.


git clone
cd pyEnsemblRest
sudo python install


To import an setup a new EnsemblRest object you should do the following:

from ensemblrest import EnsemblRest
ensRest = EnsemblRest()

EnsemblRest() istance points to . In order to use EnsemblGenome, you can import a different object:

from ensemblrest import EnsemblGenomeRest
ensGenomeRest = EnsemblGenomeRest()

Or, as an alternative, you can give a different base url during EnsemblRest class instantiation:

from ensemblrest import EnsemblRest
ensGenomeRest = EnsemblRest(base_url='')

To use a custom EnsEMBL REST server you should setup the EnsemblRest as the precedent way:

from ensemblrest import EnsemblRest
# setup rest object to point to localhost server. The 3000 stands for REST default port
ensRest = EnsemblRest(base_url='http://localhost:3000')

You may also provide proxy server settings in the form of a dict, as follows:

from ensemblrest import EnsemblRest
# setup rest object to point to a proxy server
ensRest = EnsemblRest(proxies={'http':'', 'https':''})

EnsEMBL has a rate-limit policy to deal with requests. You can do up to 15 requests per second. You could wait a little during your requests:

from time import sleep
# sleep for a second so we don't get rate-limited

Alternatively this library verifies and limits your requests to 15 requests per second. Avoid to run different python processes to get your data, otherwise you will be blacklisted by ensembl team. If you have to do a lot or requests, consider to use POST supported endpoints, or contact ensembl team to add POST support to endpoints of your interest.

GET endpoints

EnsemblRest and EnsemblGenomeRest class methods are not defined in libraries, so you cannot see docstring using help() method on python or ipython terminal. However you can see all methods available for ensembl and ensemblgenomes rest server once class is instantiate. To get help on a particoular method, please refer to ensembl help documentation on different endpoints in the ensembl and ensemblgenomes rest service. Please note that endpoints on ensembl may be different from ensemblgenomes endpoints. If you look, for example, at sequence endpoint documentation, you will find optional and required parameters. Required parameters must be specified in order to work properly, otherwise you will get an exception. Optional parameters may be specified or not, depending on your request. In all cases parameter name are the same used in documentation. For example to get data using sequence endpoint, you must specify at least required parameters:

seq = ensRest.getSequenceById(id='ENSG00000157764')

In order to mask sequence and to expand the 5’ UTR you may set optional parameters using the same name described in documentation:

seq = ensRest.getSequenceById(id='ENSG00000157764', mask="soft", expand_5prime=1000)

Multiple values for a certain parameters (for GET methods) can be submitted in a list. For example, to get the same results of

curl ';feature=transcript;feature=cds;feature=exon' -H 'Content-type:application/json'

as described in overlap region GET endpoint, you can use the following function:

data = ensRest.getOverlapByRegion(species="human", region="7:140424943-140624564", feature=["gene", "transcript", "cds", "exon"])

POST endpoints

POST endpoints can be used as the GET endpoints, the only difference is that they support parameters in python list in order to perform multiple queries on the same ensembl endpoint. The parameters name are the same used in documentation, for example we can use the POST sequence endpoint in such way:

seqs = ensRest.getSequenceByMultipleIds(ids=["ENSG00000157764", "ENSG00000248378" ])

where the example value { "ids" : ["ENSG00000157764", "ENSG00000248378" ] } is converted in the non-positional argument ids=["ENSG00000157764", "ENSG00000248378" ]. As the previous example, we can add optional parameters:

seqs = ensRest.getSequenceByMultipleIds(ids=["ENSG00000157764", "ENSG00000248378"], mask="soft")

Change the default Output format

You can change the default output format by passing a supported Content-type using the content_type parameter, for example:

plain_xml = ensRest.getArchiveById(id='ENSG00000157764', content_type="text/xml")

For a complete list of supported Content-type see Supported MIME Types from ensembl REST documentation. You need also to check if the same Content-type is supported in the EnsEMBL endpoint description.

Methods list

Here is a list of all methods defined. Methods called by ensRest instance are specific to ensembl rest server, while methods called via ensGenomeRest instance are specific of ensemblgenomes rest server.

To access the Archive endpoints you can use the following methods:

print ensRest.getArchiveById(id="ENSG00000157764")
print ensRest.getArchiveByMultipleIds(id=["ENSG00000157764", "ENSG00000248378"])

To access the Comparative Genomics endpoints you can use the following methods:

print ensGenomeRest.getGeneFamilyById(id="MF_01687", compara="bacteria")
print ensGenomeRest.getGeneFamilyMemberById(id="b0344", compara="bacteria")
print ensGenomeRest.getGeneFamilyMemberBySymbol(symbol="lacZ", species="escherichia_coli_str_k_12_substr_mg1655", compara="bacteria")
# Change the returned content type to "Newick" format
print ensRest.getGeneTreeById(id='ENSGT00390000003602', nh_format="simple", content_type="text/x-nh")
print ensRest.getGeneTreeMemberById(id='ENSG00000157764')
print ensRest.getGeneTreeMemberBySymbol(species='human', symbol='BRCA2')
print ensRest.getAlignmentByRegion(species="taeniopygia_guttata", region="2:106040000-106040050:1", species_set_group="sauropsids")
print ensRest.getHomologyById(id='ENSG00000157764')
print ensRest.getHomologyBySymbol(species='human', symbol='BRCA2')

To access the Cross References endpoints you can use the following methods:

print ensRest.getXrefsById(id='ENSG00000157764')
print ensRest.getXrefsByName(species='human', name='BRCA2')
print ensRest.getXrefsBySymbol(species='human', symbol='BRCA2')

To access the Information endpoints you can use the following methods:

print ensRest.getInfoAnalysis(species="homo_sapiens")
print ensRest.getInfoAssembly(species="homo_sapiens", bands=1) #bands is an optional parameter
print ensRest.getInfoAssemblyRegion(species="homo_sapiens", region_name="X")
print ensRest.getInfoBiotypes(species="homo_sapiens")
print ensRest.getInfoComparaMethods()
print ensRest.getInfoComparaSpeciesSets(methods="EPO")
print ensRest.getInfoComparas()
print ensRest.getInfoData()
print ensGenomeRest.getInfoEgVersion()
print ensRest.getInfoExternalDbs(species="homo_sapiens")
print ensGenomeRest.getInfoDivisions()
print ensGenomeRest.getInfoGenomesByName(name="campylobacter_jejuni_subsp_jejuni_bh_01_0142")

#This response is very heavy
#print ensGenomeRest.getInfoGenomes()

print ensGenomeRest.getInfoGenomesByAccession(division="U00096")
print ensGenomeRest.getInfoGenomesByAssembly(division="GCA_000005845")
print ensGenomeRest.getInfoGenomesByDivision(division="EnsemblPlants")
print ensGenomeRest.getInfoGenomesByTaxonomy(division="Arabidopsis")
print ensRest.getInfoPing()
print ensRest.getInfoRest()
print ensRest.getInfoSoftware()
print ensRest.getInfoSpecies(division="ensembl")
print ensRest.getInfoVariation(species="homo_sapiens")
# Restrict populations returned to e.g. only populations with LD data. It is highly recommended
# to set a filter and to avoid loading the complete list of populations.
print ensRest.getInfoVariationPopulations(species="homo_sapiens", filter="LD")

To access the Linkage Disequilibrium endpoints you can use the following methods:

print ensRest.getLdId(species="human", id="rs1042779", population_name="1000GENOMES:phase_3:KHV", window_size=500, d_prime=1.0)
print ensRest.getLdPairwise(species="human", id1="rs6792369", id2="rs1042779")
print ensRest.getLdRegion(species="human", region="6:25837556..25843455", population_name="1000GENOMES:phase_3:KHV")

To access the Lookup endpoints you can use the following methods:

print ensRest.getLookupById(id='ENSG00000157764')
print ensRest.getLookupByMultipleIds(ids=["ENSG00000157764", "ENSG00000248378" ])
print ensRest.getLookupBySymbol(species="homo_sapiens", symbol="BRCA2", expand=1)
print ensRest.getLookupByMultipleSymbols(species="homo_sapiens", symbols=["BRCA2", "BRAF"])

To access the Mapping endpoints you can use the following methods:

print ensRest.getMapCdnaToRegion(id='ENST00000288602', region='100..300')
print ensRest.getMapCdsToRegion(id='ENST00000288602', region='1..1000')
print ensRest.getMapAssemblyOneToTwo(species='human', asm_one='NCBI36', region='X:1000000..1000100:1', asm_two='GRCh37')
print ensRest.getMapTranslationToRegion(id='ENSP00000288602', region='100..300')

To access the Ontologies and Taxonomy endpoints you can use the following methods:

print ensRest.getAncestorsById(id='GO:0005667')
print ensRest.getAncestorsChartById(id='GO:0005667')
print ensRest.getDescendantsById(id='GO:0005667')
print ensRest.getOntologyById(id='GO:0005667')
print ensRest.getOntologyByName(name='transcription factor complex')
print ensRest.getTaxonomyClassificationById(id='9606')
print ensRest.getTaxonomyById(id='9606')
print ensRest.getTaxonomyByName(name="Homo%25")

To access the Overlap endpoints you can use the following methods:

print ensRest.getOverlapById(id="ENSG00000157764", feature="gene")
print ensRest.getOverlapByRegion(species="human", region="7:140424943-140624564", feature="gene")
print ensRest.getOverlapByTranslation(id="ENSP00000288602")

To access the Regulation endpoints you can use the following method:

print ensRest.getRegulatoryFeatureById(species="homo_sapiens", id="ENSR00001348195")

To access the Sequences endpoints you can use the following methods:

print ensRest.getSequenceById(id='ENSG00000157764')
print ensRest.getSequenceByMultipleIds(ids=["ENSG00000157764", "ENSG00000248378" ])
print ensRest.getSequenceByRegion(species='human', region='X:1000000..1000100')
print ensRest.getSequenceByMultipleRegions(species="homo_sapiens", regions=["X:1000000..1000100:1", "ABBA01004489.1:1..100"])

To access the Transcript Haplotypes endpoints you can use the following methods:

print ensRest.getTranscripsHaplotypes(species="homo_sapiens", id="ENST00000288602")

To access the VEP endpoints you can use the following methods:

print ensRest.getVariantConsequencesByHGVSnotation(species="human", hgvs_notation="AGT:c.803T>C")
print ensRest.getVariantConsequencesById(species='human', id='COSM476')
print ensRest.getVariantConsequencesByMultipleIds(species="human", ids=[ "rs116035550", "COSM476" ])
print ensRest.getVariantConsequencesByRegion(species='human', region='9:22125503-22125502:1', allele='C')
print ensRest.getVariantConsequencesByMultipleRegions(species="human", variants=["21 26960070 rs116645811 G A . . .", "21 26965148 rs1135638 G A . . ." ] )

To access the Variation endpoints you can use the following methods:

print ensRest.getVariationById(id="rs56116432", species="homo_sapiens")
print ensRest.getVariationByMultipleIds(ids=["rs56116432", "COSM476" ], species="homo_sapiens")

To access the Variation GA4GH endpoints you can use the following methods:

print ensRest.searchGA4GHCallSet(variantSetId=1, pageSize=2)
print ensRest.getGA4GHCallSetById(id="1:NA19777")
print ensRest.searchGA4GHDataset(pageSize=3)
print ensRest.getGA4GHDatasetById(id="6e340c4d1e333c7a676b1710d2e3953c")
print ensRest.getGA4GHVariantsById(id="1:rs1333049")
print ensRest.searchGA4GHVariants(variantSetId=1, referenceName=22, start=17190024, end=17671934, pageToken="", pageSize=1)
print ensRest.searchGA4GHVariantsets(datasetId="6e340c4d1e333c7a676b1710d2e3953c", pageToken="", pageSize=2)
print ensRest.getGA4GHVariantsetsById(id=1)
print ensRest.searchGA4GHReferences(referenceSetId="GRCh38", pageSize=10)
print ensRest.getGA4GHReferencesById(id="9489ae7581e14efcad134f02afafe26c")
print ensRest.searchGA4GHReferenceSets()
print ensRest.getGA4GHReferenceSetsById(id="GRCh38")
Release History

Release History

This version
History Node


History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
pyensemblrest-0.2.2.tar.gz (22.9 kB) Copy SHA256 Checksum SHA256 Source Jun 13, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting