pyensembl

Python interface to ensembl reference genome metadata

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

`|Build Status| <https://travis-ci.org/hammerlab/pyensembl>`_ `|Coverage
Status| <https://coveralls.io/github/hammerlab/pyensembl?branch=master>`_
`|DOI| <https://zenodo.org/badge/latestdoi/18834/hammerlab/pyensembl>`_

PyEnsembl
=========

Python interface to Ensembl reference genome metadata (exons,
transcripts, &c)

Example Usage
=============

::

from pyensembl import EnsemblRelease

# release 77 uses human reference genome GRCh38
data = EnsemblRelease(77)

# will return ['HLA-A']
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

# get all exons associated with HLA-A
exon_ids = data.exon_ids_of_gene_name('HLA-A')

Installation
============

You can install PyEnsembl using
`pip <https://pip.pypa.io/en/latest/quickstart.html>`_:

::

pip install pyensembl

This should also install any required packages, such as
`datacache <https://github.com/hammerlab/datacache>`_ and
`BioPython <http://biopython.org/>`_.

Before using PyEnsembl, run the following command to download and
install Ensembl data:

::

pyensembl install --release <list of Ensembl release numbers> --species <species-name>

For example, ``pyensembl install --release 75 76 --species human`` will
download and install all human reference data from Ensembl releases 75
and 76.

Alternatively, you can create the ``EnsemblRelease`` object from inside
a Python process and call ``ensembl_object.download()`` followed by
``ensembl_object.index()``.

Cache Location
--------------

By default, PyEnsembl uses the platform-specific ``Cache`` folder and
caches the files into the ``pyensembl`` sub-directory. You can override
this default by setting the environment key ``PYENSEMBL_CACHE_DIR`` as
your preferred location for caching:

::

export PYENSEMBL_CACHE_DIR=/custom/cache/dir

or

::

import os

os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
# ... PyEnsembl API usage

Non-Ensembl Data
================

PyEnsembl also allows arbitrary genomes via the specification of local
file paths or remote URLs to both Ensembl and non-Ensembl GTF and FASTA
files. (Warning: GTF formats can vary, and handling of non-Ensembl data
is still very much in development.)

For example:

::

data = Genome
reference_name='GRCh38',
annotation_name='my_genome_features',
gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf'))
# parse GTF and construct database of genomic features
data.index()
gene_names = data.gene_names_at_locus(contig=6, position=29945884)

API
===

The ``EnsemblRelease`` object has methods to let you access all possible
combinations of the annotation features *gene\_name*, *gene\_id*,
*transcript\_name*, *transcript\_id*, *exon\_id* as well as the location
of these genomic elements (contig, start position, end position,
strand).

Genes
-----

``genes(contig=None, strand=None)`` : returns list of Gene objects,
optionally restricted to a particular contig or strand.

``genes_at_locus(contig, position, end=None, strand=None)`` : returns
list of Gene objects overlapping a particular position on a contig,
optionally extend into a range with the ``end`` parameter and restrict
to forward or backward strand by passing ``strand='+'`` or
``strand='-'``.

``gene_by_id(gene_id)`` : return Gene object for given Ensembl gene ID
(e.g. "ENSG00000068793")

``gene_names(contig=None, strand=None)`` : returns all gene names in the
annotation database, optionally restricted to a particular contig or
strand.

``genes_by_name(gene_name)`` : get all the unqiue genes with the given
name (there might be multiple due to copies in the genome), return a
list containing a Gene object for each distinct ID.

``gene_by_protein_id(protein_id)`` : find Gene associated with the given
Ensembl protein ID (e.g. "ENSP00000350283")

``gene_names_at_locus(contig, position, end=None, strand=None)`` : names
of genes overlapping with the given locus (returns a list to account for
overlapping genes)

``gene_name_of_gene_id(gene_id)`` : name of gene with given ID

``gene_name_of_transcript_id(transcript_id)`` : name of gene associated
with given transcript ID

``gene_name_of_transcript_name(transcript_name)`` : name of gene
associated with given transcript name

``gene_name_of_exon_id(exon_id)`` : name of gene associated with given
exon ID

``gene_ids(contig=None, strand=None)`` : all gene IDs in the annotation
database

``gene_ids_of_gene_name(gene_name)`` : all Ensembl gene IDs with the
given name

Transcripts
-----------

``transcripts(contig=None, strand=None)`` : returns list of Transcript
objects for all transcript entries in the Ensembl database, optionally
restricted to a particular contig or strand.

``transcript_by_id(transcript_id)`` : construct Transcript object for
given Ensembl transcript ID (e.g. "ENST00000369985")

``transcripts_by_name(transcript_name)`` : returns list of Transcript
objects for every transcript matching the given name.

``transcript_names(contig=None, strand=None)`` : all transcript names in
the annotation database

``transcript_ids(contig=None, strand=None)`` : returns all transcript
IDs in the annotation database

``transcript_ids_of_gene_id(gene_id)`` : return IDs of all transcripts
associated with given gene ID

``transcript_ids_of_gene_name(gene_name)`` : return IDs of all
transcripts associated with given gene name

``transcript_ids_of_transcript_name(transcript_name)`` : find all
Ensembl transcript IDs with the given name

``transcript_ids_of_exon_id(exon_id)`` : return IDs of all transcripts
associatd with given exon ID

Exons
-----

``exon_ids(contig=None, strand=None)`` : returns list of exons IDs in
the annotation database, optionally restricted by the given chromosome
and strand

``exon_ids_of_gene_id(gene_id)`` : returns list of exon IDs associated
with a given gene ID

``exon_ids_of_gene_name(gene_name)`` : returns list of exon IDs
associated with a given gene name

``exon_ids_of_transcript_id(transcript_id)`` : returns list of exon IDs
associated with a given transcript ID

``exon_ids_of_transcript_name(transcript_name)`` : returns list of exon
IDs associated with a given transcript name

.. |Build
Status| image:: https://travis-ci.org/hammerlab/pyensembl.svg?branch=master
.. |Coverage
Status| image:: https://coveralls.io/repos/hammerlab/pyensembl/badge.svg?branch=master&service=github
.. |DOI| image:: https://zenodo.org/badge/18834/hammerlab/pyensembl.svg

Project details

These details have been verified by PyPI

Maintainers

hammerlab iskander openvax tavinathanson timodonnell

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

2.3.13

Apr 25, 2024

2.3.12

Mar 28, 2024

2.3.11

Feb 27, 2024

2.3.10

Feb 27, 2024

2.3.9

Jan 17, 2024

2.3.8

Jan 17, 2024

2.3.7

Jan 17, 2024

2.3.6

Jan 16, 2024

2.3.4

Jan 11, 2024

2.3.3

Jan 11, 2024

2.3.2

Jan 11, 2024

2.3.1

Jan 11, 2024

2.3.0

Jan 10, 2024

2.2.9

Aug 17, 2023

2.2.8

Feb 16, 2023

2.2.7

Feb 15, 2023

2.2.6

Feb 13, 2023

2.2.5

Feb 6, 2023

2.2.4

Dec 21, 2022

2.2.3

Dec 2, 2022

2.2.2

Dec 1, 2022

2.2.1

Dec 1, 2022

2.2.0

Dec 1, 2022

2.1.0

Oct 24, 2022

2.0.2

Oct 24, 2022

2.0.1

Aug 9, 2022

2.0.0

Apr 12, 2022

1.9.4

Oct 9, 2021

1.9.3

Oct 9, 2021

1.9.2

Sep 23, 2021

1.9.1

Dec 28, 2020

1.9.0

Aug 28, 2020

1.8.8

Aug 5, 2020

1.8.7

May 27, 2020

1.8.6

May 27, 2020

1.8.5

Jan 21, 2020

1.8.4

Oct 20, 2019

1.8.3

Oct 6, 2019

1.8.2

Oct 5, 2019

1.8.1

Oct 3, 2019

1.8.0

Sep 5, 2019

1.7.5

Jul 31, 2019

1.7.4

Apr 1, 2019

1.7.3

Nov 8, 2018

1.7.2

Sep 23, 2018

1.7.1

Sep 21, 2018

1.7.0

Sep 20, 2018

1.6.0

Jul 31, 2018

1.5.2

Jul 31, 2018

1.5.0

Jul 30, 2018

1.4.0

Jul 6, 2018

1.3.0

Jun 27, 2018

1.2.6

Feb 26, 2018

1.2.4

Feb 26, 2018

1.2.3

Feb 24, 2018

1.2.2

Feb 24, 2018

1.2.1

Feb 21, 2018

This version

1.1.0

Jan 24, 2017

1.0.3

Oct 11, 2016

1.0.2

Oct 10, 2016

1.0.1

Sep 19, 2016

1.0.0

Sep 16, 2016

0.9.7

Sep 14, 2016

0.9.6

Sep 13, 2016

0.9.5

Jul 26, 2016

0.9.4

Jul 22, 2016

0.9.3

Jul 1, 2016

0.9.1

Jun 7, 2016

0.9.0

May 27, 2016

0.8.14

May 13, 2016

0.8.13

May 13, 2016

0.8.12

May 11, 2016

0.8.11

Mar 29, 2016

0.8.10

Mar 25, 2016

0.8.9

Mar 24, 2016

0.8.8

Feb 23, 2016

0.8.7

Feb 22, 2016

0.8.5

Feb 19, 2016

0.8.4

Oct 27, 2015

0.8.3

Sep 27, 2015

0.8.2

Aug 30, 2015

0.8.1

Aug 21, 2015

0.7.0

Aug 2, 2015

0.6.11

Jul 22, 2015

0.6.10

Jul 1, 2015

0.6.9

Jun 2, 2015

0.6.8

Apr 30, 2015

0.6.7

Apr 24, 2015

0.6.5

Apr 23, 2015

0.6.4

Apr 15, 2015

0.6.3

Apr 9, 2015

0.6.2

Mar 26, 2015

0.6.1

Mar 25, 2015

0.6.0

Mar 25, 2015

0.5.13

Mar 10, 2015

0.5.12

Mar 8, 2015

0.5.11

Mar 5, 2015

0.5.10

Mar 5, 2015

0.5.9

Mar 5, 2015

0.5.8

Mar 5, 2015

0.5.7

Mar 5, 2015

0.5.4

Feb 19, 2015

0.5.3

Feb 5, 2015

0.5.2

Jan 13, 2015

0.5.1

Dec 19, 2014

0.5.0

Dec 12, 2014

0.4.0

Dec 1, 2014

0.3.3

Oct 17, 2014

0.3

Oct 15, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyensembl-1.1.0.tar.gz (59.0 kB view details)

Uploaded Jan 24, 2017 Source

File details

Details for the file pyensembl-1.1.0.tar.gz.

File metadata

Download URL: pyensembl-1.1.0.tar.gz
Upload date: Jan 24, 2017
Size: 59.0 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pyensembl-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fe6512f86e29538c22f518828c9cf745ba97ca895dfbf3dfe6a6acdf31f9b5f6`
MD5	`97eb9223fdfbc5bcf6a39d00010fe298`
BLAKE2b-256	`232f398e41dfcaf1ddd4c349f9006de2397e23044c7c9c543840f239af6390e4`