Skip to main content

A protein-coding gene annotation fixing tool

Project description

LiftOn

License: MIT version

LiftOn is a lift-over annotator that takes Liftoff and miniprot GFF files as input. It accurately generates gene annotations, with a particular focus on protein-coding genes. LiftOn takes consensus from both sources and generates optimal annotations that outperform both Liftoff and miniprot!


Why LiftOn❓#

  1. The current approach to generate the annotation of T2T-CHM13 is to run Liftoff to lift-over annotations from GRCh38 to T2T-CHM13. However, Liftoff is not perfect. T2T-CHM13 annotation is far from perfect. We need a tool to accurately generates T2T-CHM13 annotations.
  2. More and more high quality assemblies are generated. We need to annotate them.
  3. The current lift-over tools mainly depend on either DNA aligners (Liftoff, minimap2) or protein aligner (miniprot). They are not perfect, as there are instances where they make mistakes.

What does LiftOn do❓#

LiftOn takes GFF files from Liftoff and miniprot and reference protein sequences in a FASTA file, and generates a new annotation file in GFF format. LiftOn works on the same and closely-related species.

  • Input: Liftoff GFF file / miniprot GFF file / protein FASTA file
  • Output: LiftOn GFF file

LiftOn utilizes gene loci coordinates obtained from Liftoff, as Liftoff employs an overlapping fixing algorithm to determine the most suitable gene locus for each gene.

First, LiftOn extracts protein sequences annotated by Liftoff and miniprot, and aligns them to the reference proteins.

Next, LiftOn employs an algorithm that compares each section of the protein alignments from Liftoff and miniprot, corrects errors in exon and CDS boundaries, and produces the optimal protein annotations.


Who is it for❓#

  1. If you have sequenced and assembled a new human genome and need to annotate it, LiftOn is the ideal choice for generating annotations.
  2. If you wish to utilize the finest CHM13 annotation, you can run LiftOn! We have also pre-generated the T2T_CHM13_LiftOn.gff3 file for your convenience."

Installation#

LiftOn is on PyPi. This is the easiest installation approach. Check out all the releases here.

$ pip install lifton

You can also install LiftOn from source

$ git clone https://github.com/Kuanhao-Chao/LiftOn --recursive

$ cd lifton

$ python setup.py install


Quick Start#

Running LiftOn is simple. It only requires one line of code!

Example 1: clean up alignment files (BAM)

$ cd test

# Step 1: extract splice junctions in the alignment file
$ lifton --proteins protein.fasta --liftoffdb CHM13_MANE.sort.gff3_db --miniprotdb CHM13_MANE_miniprot.fix.sorted.gff_db -o chm13v2.0.fa

Citation#

Kuan-Hao Chao*, Mihaela Pertea, Steven L Salzberg*, "LiftOn: a tool to improve annotations for protein-coding genes during the lift-over process.", bioRxiv 2023.07.27.550754, doi: https://doi.org/10.1101/2023.07.27.550754, 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lifton-0.0.1.tar.gz (56.7 kB view hashes)

Uploaded Source

Built Distribution

lifton-0.0.1-py3-none-any.whl (68.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page