Skip to main content

A protein-coding gene annotation fixing tool

Project description


https://img.shields.io/badge/License-GPLv3-yellow.svg https://img.shields.io/badge/version-v.0.0.1-blue https://static.pepy.tech/personalized-badge/lifton?period=total&units=abbreviation&left_color=grey&right_color=blue&left_text=PyPi%20downloads https://img.shields.io/github/downloads/Kuanhao-Chao/lifton/total.svg?style=social&logo=github&label=Download https://img.shields.io/badge/platform-macOS_/Linux-green.svg https://colab.research.google.com/assets/colab-badge.svg

LiftOn is a homology-based lift-over tool designed to accurately map annotations in GFF or GTF between assemblies. It is built upon the fantastic Liftoff (credits to Dr. Alaina Shumate) and miniprot (credits to Dr. Heng Li), and employs a Protein-maximization algorithm to improve the protein-coding gene lift-over process.

Why LiftOn❓#

  1. Burgeoning number of genome assemblies: As of December 2023, among the 15,578 distinct eukaryotic genomes, only 1,111 have been annotated (Eukaryotic Genome Annotation at NCBI). More and more high quality assemblies are generated. We need to accurately annotate them.

  2. Improved protein-coding gene mapping: The popular Liftoff map genes only based on the DNA alignment. With the protein-to-genome alignment, LiftOn is able to further improve the lift-over protein-coding gene annotations. LiftOn improves the current released T2T-CHM13 annotation (JHU RefSeqv110 + Liftoff v5.1).

  3. Improved distant species lift-over: LiftOn extends from lift-over between the same or closely related species to more distantly related species. See mouse_2_rat and drosophila_melanogaster_2_erecta lift-over sections.

LiftOn is free, it's open source, it's easy to install , and it's in Python!


Who is it for❓#

  1. If you have sequenced and assembled a new genome and need to annotate it, LiftOn is the ideal choice for generating annotations.

  2. If you want to do comparative genomics analysis, run liftOn to lift-over and compare annotations!

  3. If you wish to utilize the finest CHM13 annotation, you can run LiftOn! We have also pre-generated the T2T_CHM13_LiftOn.gff3 file for your convenience.


What does LiftOn do❓#

Given a reference Genome R, an Annotation RA, and a target Genome T. The lift-over problem is defined as the process of changing the coordinates of Annotation RA from Genome R to Genome T, and generate a new annotation file Annotation TA. A simple illustration of the lift-over problem is shown in Figure 1.

graphics/liftover_illustration.gif

Figure 1 Illustration of the lift-over problem. The annotation file from the reference genome (top) is lifted over to the target genome (bottom).#


LiftOn is the best tool to help you solve this problem! LiftOn employs a two-step protein maximization algorithm (PM algorithm).

  1. The first module is the chaining algorithm. It starts by extracting protein sequences annotated by Liftoff and miniprot. LiftOn then aligns these sequences to full-length reference proteins. For each gene locus, LiftOn compares each section of the protein alignments from Liftoff and miniprot, chaining together the best combinations.

  2. The second module is the open-reading frame search (ORF search) algorithm. In the case of truncated protein-coding transcripts, this algorithm examines alternative frames to identify the ORF that produces the longest match with the reference protein.

  • Input:
    1. target Genome T in FASTA.

    2. reference Genome R in FASTA

    3. reference Annotation RA in GFF3

  • Output:
    1. LiftOn annotation file, Annotation TA, in GFF3

    2. Protein sequence identities & mutation types

    3. Features with extra copies

    4. Unmapped features


LiftOn's limitation#

LiftOn's chaining algorithm currently only utilizes miniprot alignment results to fix the Liftoff annotation. However, it can be extended to chain together multiple protein-based annotation files or aasembled RNA-Seq transcripts.

DNA- and protein-based methods still have some limitations. We are developing a module to merge the LiftOn annotation with the released curated annotations to generate better annotations.

The LiftOn chaining algorithm now does not support multi-threading. This functionality stands as our next targeted feature on the development horizon!


User support#

Please go through the documentation below first. If you have questions about using the package, a bug report, or a feature request, please use the GitHub issue tracker here:

https://github.com/Kuanhao-Chao/LiftOn/issues


Key contributors#

LiftOn was designed and developed by Kuan-Hao Chao. This documentation was written by Kuan-Hao Chao.


Table of contents#


Citation#

Kuan-Hao Chao*, Mihaela Pertea, Steven L Salzberg*, "LiftOn: a tool to improve annotations for protein-coding genes during the lift-over process.", bioRxiv 2023.07.27.550754, doi: https://doi.org/10.1101/2023.07.27.550754, 2023




Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lifton-0.0.3.tar.gz (69.4 kB view hashes)

Uploaded Source

Built Distribution

lifton-0.0.3-py3-none-any.whl (83.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page