Python interface to Ensembl reference genome metadata

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project description

PyEnsembl

PyEnsembl is a Python interface to Ensembl reference genome metadata such as exons and transcripts. PyEnsembl downloads GTF and FASTA files from the Ensembl FTP server and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.

Example Usage

from pyensembl import EnsemblRelease

# release 77 uses human reference genome GRCh38
data = EnsemblRelease(77)

# will return ['HLA-A']
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

# get all exons associated with HLA-A
exon_ids  = data.exon_ids_of_gene_name('HLA-A')

Installation

PyEnsembl requires Python 3.9 or later. You can install PyEnsembl using pip:

pip install pyensembl

This should also install any required packages such as datacache.

Before using PyEnsembl, run the following command to download and install Ensembl data:

pyensembl install --release <list of Ensembl release numbers> --species <species-name>

For example, pyensembl install --release 75 76 --species human will download and install all human reference data from Ensembl releases 75 and 76.

Alternatively, you can create the EnsemblRelease object from inside a Python process and call ensembl_object.download() followed by ensembl_object.index().

Development Setup

For development, install PyEnsembl in editable mode with development dependencies:

git clone https://github.com/openvax/pyensembl.git
cd pyensembl
pip install -e .[dev]

This installs the package in development mode along with tools for testing, linting, and building:

pytest for running tests
ruff and flake8 for code linting
pytest-cov for coverage reporting
build for package building

Run tests with:

pytest

Cache Location

By default, PyEnsembl uses the platform-specific Cache folder and caches the files into the pyensembl sub-directory. You can override this default by setting the environment key PYENSEMBL_CACHE_DIR as your preferred location for caching:

export PYENSEMBL_CACHE_DIR=/custom/cache/dir

import os

os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
# ... PyEnsembl API usage

Usage tips

List installed genomes

To see the genomes for which PyEnsembl has already downloaded and indexed metadata you can run:

pyensembl list

Or equivalently do this in Python:

from pyensembl.shell import collect_all_installed_ensembl_releases
collect_all_installed_ensembl_releases()

Load genome in Python

Here's an example Python snippet that loads fly genome data from Ensembl release v100:

from pyensembl import EnsemblRelease
data = EnsemblRelease(release=100, species='drosophila_melanogaster')

Data structures

Gene

gene = genome.gene_by_id(gene_id='FBgn0011747')

Transcript

transcript = gene.transcripts[0]

Protein information

transcript.protein_id
transcript.protein_sequence

Non-Ensembl Data

PyEnsembl also allows arbitrary genomes via the specification of local file paths or remote URLs to both Ensembl and non-Ensembl GTF and FASTA files. (Warning: GTF formats can vary, and handling of non-Ensembl data is still very much in development.)

For example:

from pyensembl import Genome
data = Genome(
    reference_name='GRCh38',
    annotation_name='my_genome_features',
    # annotation_version=None,
    gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf', # Path or URL of GTF file
    # transcript_fasta_paths_or_urls=None, # List of paths or URLs of FASTA files containing transcript sequences
    # protein_fasta_paths_or_urls=None, # List of paths or URLs of FASTA files containing protein sequences
    # cache_directory_path=None, # Where to place downloaded and cached files for this genome
)
# parse GTF and construct database of genomic features
data.index()
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

API

The EnsemblRelease object has methods to let you access all possible combinations of the annotation features gene_name, gene_id, transcript_name, transcript_id, exon_id as well as the location of these genomic elements (contig, start position, end position, strand).

Genes

genes(contig=None, strand=None, biotype=None): Returns a list of Gene objects, optionally restricted to a particular contig, strand, or gene_biotype.
genes_at_locus(contig, position, end=None, strand=None): Returns a list of Gene objects overlapping a particular position on a contig, optionally extend into a range with the end parameter and restrict to forward or backward strand by passing strand='+' or strand='-'.
gene_by_id(gene_id): Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").
gene_names(contig=None, strand=None): Returns all gene names in the annotation database, optionally restricted to a particular contig or strand.
genes_by_name(gene_name): Get all the unique genes with the given name (there might be multiple due to copies in the genome), return a list containing a Gene object for each distinct ID.
gene_by_protein_id(protein_id): Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")
gene_names_at_locus(contig, position, end=None, strand=None): Names of genes overlapping with the given locus, optionally restricted by strand. (returns a list to account for overlapping genes)
gene_name_of_gene_id(gene_id): Returns name of gene with given gene ID.
gene_name_of_transcript_id(transcript_id): Returns name of gene associated with given transcript ID.
gene_name_of_transcript_name(transcript_name): Returns name of gene associated with given transcript name.
gene_name_of_exon_id(exon_id): Returns name of gene associated with given exon ID.
gene_ids(contig=None, strand=None, biotype=None): Return all gene IDs in the annotation database, optionally restricted by chromosome name, strand, or gene_biotype.
gene_ids_of_gene_name(gene_name): Returns all Ensembl gene IDs with the given name.
nearest_gene(contig, position, end=None, strand=None): Returns (distance, Gene) for the gene whose locus is nearest to the position (or position..end interval) on the given contig — even when no gene overlaps. Returns (inf, None) when no candidates exist.
merged_gene_intervals(contig, strand=None): Returns the union of all gene loci on the contig as a sorted list of non-overlapping (start, end) tuples. Adjacent intervals (end+1 == next start) are merged into one.

Transcripts

transcripts(contig=None, strand=None, biotype=None): Returns a list of Transcript objects for all transcript entries in the Ensembl database, optionally restricted to a particular contig, strand, or transcript_biotype.
transcript_by_id(transcript_id): Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")
transcripts_by_name(transcript_name): Returns a list of Transcript objects for every transcript matching the given name.
transcript_names(contig=None, strand=None): Returns all transcript names in the annotation database.
transcript_ids(contig=None, strand=None, biotype=None): Returns all transcript IDs in the annotation database.
transcript_ids_of_gene_id(gene_id): Return IDs of all transcripts associated with given gene ID.
transcript_ids_of_gene_name(gene_name): Return IDs of all transcripts associated with given gene name.
transcript_ids_of_transcript_name(transcript_name): Find all Ensembl transcript IDs with the given name.
transcript_ids_of_exon_id(exon_id): Return IDs of all transcripts associated with given exon ID.
nearest_transcript(contig, position, end=None, strand=None): Returns (distance, Transcript) to the closest transcript on the contig. Returns (inf, None) when no candidates exist.

Exons

exon_ids(contig=None, strand=None): Returns a list of exon IDs in the annotation database, optionally restricted by the given chromosome and strand.
exon_by_id(exon_id): Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")
exon_ids_of_gene_id(gene_id): Returns a list of exon IDs associated with a given gene ID.
exon_ids_of_gene_name(gene_name): Returns a list of exon IDs associated with a given gene name.
exon_ids_of_transcript_id(transcript_id): Returns a list of exon IDs associated with a given transcript ID.
exon_ids_of_transcript_name(transcript_name): Returns a list of exon IDs associated with a given transcript name.

Project details

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.10.1

May 13, 2026

This version

2.10.0

May 13, 2026

2.9.8

May 13, 2026

2.9.7

May 13, 2026

2.9.6

May 13, 2026

2.9.5

May 12, 2026

2.9.4

May 12, 2026

2.9.3

May 12, 2026

2.9.2

May 12, 2026

2.9.1

May 12, 2026

2.9.0

May 12, 2026

2.8.0

May 12, 2026

2.7.0

May 12, 2026

2.6.13

May 12, 2026

2.6.7

Apr 21, 2026

2.6.6

Apr 21, 2026

2.6.5

Apr 21, 2026

2.6.4

Apr 20, 2026

2.6.2

Apr 7, 2026

2.6.1

Apr 3, 2026

2.6.0

Apr 3, 2026

2.3.13

Apr 25, 2024

2.3.12

Mar 28, 2024

2.3.11

Feb 27, 2024

2.3.10

Feb 27, 2024

2.3.9

Jan 17, 2024

2.3.8

Jan 17, 2024

2.3.7

Jan 17, 2024

2.3.6

Jan 16, 2024

2.3.4

Jan 11, 2024

2.3.3

Jan 11, 2024

2.3.2

Jan 11, 2024

2.3.1

Jan 11, 2024

2.3.0

Jan 10, 2024

2.2.9

Aug 17, 2023

2.2.8

Feb 16, 2023

2.2.7

Feb 15, 2023

2.2.6

Feb 13, 2023

2.2.5

Feb 6, 2023

2.2.4

Dec 21, 2022

2.2.3

Dec 2, 2022

2.2.2

Dec 1, 2022

2.2.1

Dec 1, 2022

2.2.0

Dec 1, 2022

2.1.0

Oct 24, 2022

2.0.2

Oct 24, 2022

2.0.1

Aug 9, 2022

2.0.0

Apr 12, 2022

1.9.4

Oct 9, 2021

1.9.3

Oct 9, 2021

1.9.2

Sep 23, 2021

1.9.1

Dec 28, 2020

1.9.0

Aug 28, 2020

1.8.8

Aug 5, 2020

1.8.7

May 27, 2020

1.8.6

May 27, 2020

1.8.5

Jan 21, 2020

1.8.4

Oct 20, 2019

1.8.3

Oct 6, 2019

1.8.2

Oct 5, 2019

1.8.1

Oct 3, 2019

1.8.0

Sep 5, 2019

1.7.5

Jul 31, 2019

1.7.4

Apr 1, 2019

1.7.3

Nov 8, 2018

1.7.2

Sep 23, 2018

1.7.1

Sep 21, 2018

1.7.0

Sep 20, 2018

1.6.0

Jul 31, 2018

1.5.2

Jul 31, 2018

1.5.0

Jul 30, 2018

1.4.0

Jul 6, 2018

1.3.0

Jun 27, 2018

1.2.6

Feb 26, 2018

1.2.4

Feb 26, 2018

1.2.3

Feb 24, 2018

1.2.2

Feb 24, 2018

1.2.1

Feb 21, 2018

1.1.0

Jan 24, 2017

1.0.3

Oct 11, 2016

1.0.2

Oct 10, 2016

1.0.1

Sep 19, 2016

1.0.0

Sep 16, 2016

0.9.7

Sep 14, 2016

0.9.6

Sep 13, 2016

0.9.5

Jul 26, 2016

0.9.4

Jul 22, 2016

0.9.3

Jul 1, 2016

0.9.1

Jun 7, 2016

0.9.0

May 27, 2016

0.8.14

May 13, 2016

0.8.13

May 13, 2016

0.8.12

May 11, 2016

0.8.11

Mar 29, 2016

0.8.10

Mar 25, 2016

0.8.9

Mar 24, 2016

0.8.8

Feb 23, 2016

0.8.7

Feb 22, 2016

0.8.5

Feb 19, 2016

0.8.4

Oct 27, 2015

0.8.3

Sep 27, 2015

0.8.2

Aug 30, 2015

0.8.1

Aug 21, 2015

0.7.0

Aug 2, 2015

0.6.11

Jul 22, 2015

0.6.10

Jul 1, 2015

0.6.9

Jun 2, 2015

0.6.8

Apr 30, 2015

0.6.7

Apr 24, 2015

0.6.5

Apr 23, 2015

0.6.4

Apr 15, 2015

0.6.3

Apr 9, 2015

0.6.2

Mar 26, 2015

0.6.1

Mar 25, 2015

0.6.0

Mar 25, 2015

0.5.13

Mar 10, 2015

0.5.12

Mar 8, 2015

0.5.11

Mar 5, 2015

0.5.10

Mar 5, 2015

0.5.9

Mar 5, 2015

0.5.8

Mar 5, 2015

0.5.7

Mar 5, 2015

0.5.4

Feb 19, 2015

0.5.3

Feb 5, 2015

0.5.2

Jan 13, 2015

0.5.1

Dec 19, 2014

0.5.0

Dec 12, 2014

0.4.0

Dec 1, 2014

0.3.3

Oct 17, 2014

0.3

Oct 15, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyensembl-2.10.0.tar.gz (87.3 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyensembl-2.10.0-py3-none-any.whl (66.5 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file pyensembl-2.10.0.tar.gz.

File metadata

Download URL: pyensembl-2.10.0.tar.gz
Upload date: May 13, 2026
Size: 87.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pyensembl-2.10.0.tar.gz
Algorithm	Hash digest
SHA256	`00f827c926cb7faffe13c5bdb2c0246aa172581b1644b02b6a49c9f6eb58d070`
MD5	`95d622b0a52ce7181147f614ab29bb14`
BLAKE2b-256	`e19cb5e5b60d73ce3d04e8da9c35a0c8a7a022d4ee20041a0776cd48b5019381`

See more details on using hashes here.

File details

Details for the file pyensembl-2.10.0-py3-none-any.whl.

File metadata

Download URL: pyensembl-2.10.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 66.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for pyensembl-2.10.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b40277387320527f9822bb3ef8c0ecdd85638433ed12f6167d1983a565d96a7`
MD5	`6d99c2c940a878cee0766105786fdceb`
BLAKE2b-256	`ca64313d182039392fc15c323e9545229f11c4d03208e96271a51e4989497d9f`

See more details on using hashes here.

pyensembl 2.10.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PyEnsembl

Example Usage

Installation

Development Setup

Cache Location

Usage tips

List installed genomes

Load genome in Python

Data structures

Gene

Transcript

Protein information

Non-Ensembl Data

API

Genes

Transcripts

Exons

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes