Skip to main content

Easily map epitope coordinates between sequences in an alignment

Project description

Epitope aligner

Easily map epitope coordinates between sequences in an alignment, regardless of which coordinate system you are using. This lets you combine epitopes from different sources and calculate things like epitope density in a set of proteins.

epitope_aligner is a python package hosted on Github at BarinthusBio/epitope_aligner.

Full documentation at barinthusbio.github.io/epitope_aligner.

If you have any suggestions or problems, please open an issue.

Contents

Install

Install directly from github using one of:

  • pip install git+https://github.com/Vaccitech/epitope_aligner.git
  • pip install git+git@github.com:Vaccitech/epitope_aligner.git

Quickstart

The full quickstart example is here which analyses and plots the epitopes from different strains of the influenza virus.

In the current minimal example we'll:

  • convert epitope coordinates to an aligned antigen
  • float the epitope sequences to match it
  • calculate the number of epitopes at each position in the antigen

For the inverse of these aligning and floating operations see the cookbook.

Import functions from epitope_aligner modules and pandas to create an example dataset.

from epitope_aligner import map, stretch, utils
import pandas as pd

We'll define a short example antigen sequence, with an aligned and unaligned version.

aligned_seq = "ABC---DEFGH-IJK--LM"
seq = aligned_seq.replace("-","")

We'll define some exmple epitopes with positions in the unaligned antigen sequence.

epitopes = pd.DataFrame({
        'start':  [2,      6,      9],
        'end':    [4,      9,      12],
        'seq':    ["BCD",  "FGHI", "IJKL"],
        "length": [3,       4,     4]
})
epitopes
#    start  end   seq  length
# 0      2    4   BCD       3
# 1      6    9  FGHI       4
# 2      9   12  IJKL       4

Let's calculate the start positions of these epitopes in the aligned antigen sequence.

epitopes['newstart'] = map.align_coords(
    table = epitopes,
    aligned_parent_seq = aligned_seq,
    coord_col = "start",
    index = 1
)
epitopes
#    start  end   seq  length  newstart
# 0      2    4   BCD       3         2
# 1      6    9  FGHI       4         9
# 2      9   12  IJKL       4        13

Now we can "float" an epitope to line up with its antigen based on a start position and antigen sequence.

epitopes['float'] = map.float_epitopes(
    table=epitopes,
    parent_seq=aligned_seq,
    start_col="newstart",
    index=1,
)
epitopes
# Aligned antigen
# ABC---DEFGH-IJK--LM

# Aligned epitopes
# -BC---D
# --------FGH-I
# ------------IJK--L

We can easily count the number of epitopes overlapping each position by "stretching" them. For plotting, it is often helpful to add zeros for positions with no epitopes.

stretched_epitopes = stretch.stretch(epitopes)
positional_count = stretched_epitopes.groupby("position").size()
positional_count = stretch.add_empty_positions(
    positional_count,
    parent_seq_length=len(seq),
    index=1,
    empty_value=0
)
positional_count
# position
# 1     0.0
# 2     1.0
# 3     1.0
# 4     1.0
# 5     0.0
# 6     1.0
# 7     1.0
# 8     1.0
# 9     2.0
# 10    1.0
# 11    1.0
# 12    1.0
# 13    0.0
# dtype: float64

Read the cookbook for tips on calculating more interesting measures than counts.

Examples

A real world example is demonstrated in the quickstart which analyses and plots the epitopes from different strains of the influenza virus.

The cookbook provides a detailed description and example of all functions.

The full documentation includes function APIs under the submodules:

Dev

Details on testing, creating docs, and virtual envinments.

Dev: Set up

Create a virtual environment with python3 -m venv .venv. Activate that environment with . .venv/bin/activate. Install in editable mode with pip install -e .. Deactivate it with deactivate.

Dev: Nox

Linting, bandit, documentation, examples, and testing can all be run with nox based on noxfile.py. This is also run by github actions.

Dev: Make docs

The full guide is docs/README.md but in short pdoc generates the api documentation and renders the read me, jupyter notebook examples are converted to html, and the complete docs are hosted at barinthusbio.github.io/epitope_aligner/index.html.

Generating the docs and hosting them is handled by the github actions, but if you want to produce them locally just run nox.

Dev: Publish to PyPI

Uploading requires the build and twine packages, pip install --upgrade twine build.

python -m build will create both the --sdist and --wheel. twine check dist/* will check the package is ready for uploading. twine upload dist/* will actually upload to pypi.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epitope_aligner-0.1.1.tar.gz (234.7 kB view details)

Uploaded Source

Built Distribution

epitope_aligner-0.1.1-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file epitope_aligner-0.1.1.tar.gz.

File metadata

  • Download URL: epitope_aligner-0.1.1.tar.gz
  • Upload date:
  • Size: 234.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for epitope_aligner-0.1.1.tar.gz
Algorithm Hash digest
SHA256 44112bda4b98b2821e9780e1e834b0e925319de957ff51b8a3c974c1e53e1b11
MD5 4094ab3670fd3277e1063bcb177c7068
BLAKE2b-256 c00138a309d734caa08a4f939085b743b311bf93488e5cd7b8ffac21e95a7b53

See more details on using hashes here.

File details

Details for the file epitope_aligner-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for epitope_aligner-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e527eaf29818a3902029f6682983c6b31643e1690611bdc26b65bb2bd4536a72
MD5 32b17a5bc4539cb7b520d8f487ff6b5c
BLAKE2b-256 3d6a7f2aa99709bffaa1fec4d945e49f05c8ab7f259622cb73859b25621e11d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page