Nanopore Direct RNA-seq Transcriptome Assembly
Project description
LAFITE
Low-abundance Aware Full-length Isoform clusTEr
Overview
LAFITE is designated to identify high-consensus full-length isoforms from Nanopore Direct RNA-seq data. LAFITE combines multiple features from reference annotation and DRS reads (TSS, TES, splicing junction, and read polyadenylation event) and is more sensitive to Low-abundance transcripts.
Prerequisites
- bedtools
- Minimap2
- nanopolish
- samtools
- Python 3.7/3.8/3.9
Installation
To avoid potential conflicts, we recommend running LAFITE in a conda environment.
conda create -n LAFITE_env -c bioconda python=3.7 bedtools
conda activate LAFITE_env
pip install git+https://github.com/TF-Chan-Lab/LAFITE
Usage
-
Run minimap2 and samtools to generate alignment file in bam format
minimap2 -ax splice -u f -k 14 -G 500000 --secondary=no REFERENCE_FA FASTQ > ALIGNMENT_SAM samtools view -bS ALIGNMENT_SAM|samtools sort - > ALIGNMENT_BAMLAFITE also supports other splicing-aware long read alignment tools.
-
Run Nanopolish polya to generate read polyadenylation result (optional but recommend)
Current long-read sequencing technologies (Nanopore cDNA/DRS or PacBio Iso-Seq) are all designed to capture RNA molecules with poly(A) tail. However, RNA fragmentation and pore blocking may bring a considerable part of truncated reads which will interfere downstream analysis. Therefore, LAFITE utilizes the read polyadenylation status reported by Nanopolish to filter reads that have completed the sequencing process.nanopolish index -d PATH_TO_FAST5 -s GUPPY_SEQUENCING_SUMMARY FASTQ nanopolish polya -t NUM_OF_THREADS -r FASTQ -b ALIGNMENT_BAM -g REFERENCE_FA > Nanopolish_PolyA_RESLAFITE also provides an alternative approach to estimate read polyadenylation status by scanning any poly(A) motifs that existed at the read 3'-end.
-
Run LAFITE
usage: lafite [-h] -b BAM [-B BEDTOOLS] -g GTF -f GENOME -o OUTPUT [-n MIN_COUNT_TSS_TES] [-i MIS_INTRON_LENGTH] [-c MIN_NOVEL_TRANS_COUNT] [-s MIN_SINGLE_EXON_COVERAGE] [-l MIN_SINGLE_EXON_LEN] [-L LABEL] [-p POLYA] [-m POLYA_MOTIF_FILE] [-r RELATIVE_ABUNDANCE_THRESHOLD] [-j SHORT_SJ_TAB] [-w SJ_CORRECTION_WINDOW] [--no_full_cleanup] [-t THREAD] [-T TSS_PEAK] [-d TSS_CUTOFF] Low-abundance Aware Full-length Isoform clusTEr optional arguments: -h, --help show this help message and exit -b BAM path to the alignment file in bam format -B BEDTOOLS path to the executable bedtools -g GTF path to the reference gene annotation in GTF format -f GENOME path to the reference genome fasta -o OUTPUT path to the output file -n MIN_COUNT_TSS_TES minimum number of reads supporting a alternative TSS or TES, default: 3 -i MIS_INTRON_LENGTH length cutoff for correcting unexpected small intron, default: 150 -c MIN_NOVEL_TRANS_COUNT minimum occurrences required for a isoform from novel loci, default: 3 -s MIN_SINGLE_EXON_COVERAGE minimum read coverage required for a novel single-exon transcript, default: 4 -l MIN_SINGLE_EXON_LEN minimum length for single-exon transcript, default: 100 -L LABEL name prefix for output transcripts, default: LAFT -p POLYA path to the file contains read Polyadenylation event -m POLYA_MOTIF_FILE path to the polya motif file -r RELATIVE_ABUNDANCE_THRESHOLD minimum abundance of the predicted multi-exon transcripts as a fraction of the total transcript assembled at a given locus, default: 0.01 -j SHORT_SJ_TAB path to the short read splice junction file -w SJ_CORRECTION_WINDOW edit distance to reference splicing site for splicing correction, default: 40 --no_full_cleanup keep all intermediate files -t THREAD number of the threads, default: 4 -T TSS_PEAK path to the TSS peak file -d TSS_CUTOFF minimum TSS distance for a transcript to be considered as a novel transcript
-
LAFITE can run with the following arguments:
lafite -b ALIGNMENT_BAM -g REFERENCE_GTF -f REFERENCE_FA -o OUTPUT_GTF -t NUM_OF_THREADS -p Nanopolish_PolyA_RES -
LAFITE can also run without the result from nanoplish polya. Then, a Poly(A) motif list must be provided for the corresponding species.
We have provided the Poly(A) motif list for human and mouse retrieved from Tian et al. .lafite -b ALIGNMENT_BAM -g REFERENCE_GTF -f REFERENCE_FA -o OUTPUT_GTF -t NUM_OF_THREADS -m POLYA_MOTIFS_OF_SPECIES -
LAFITE accepts the TSS peaks from 5'-end CAGE data for identifying high-confidence TSSs. Users can prepare the TSS data in the following format where:
- The first column is the chromosome name
- The second column is the 0-based start position of the TSS peak
- The third column is the 1-based end position of the TSS peak
- The fourth column is the strand information
-
LAFITE also accepts the splicing junctions from Illumina short read RNA-seq data to proof the long reads. LAFITE supports the SJ.out.tab from STAR aligner. Users can also prepare the splicing junctions in the following format where:
- The first column is the chromosome name
- The second column is the 0-based start position of the splicing junction
- The third column is the 1-based end position of the splicing junction
- The fourth column is the strand information
Development
LAFITE was developed following the fastai/nbdev framework.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file LAFITE-1.0.1.tar.gz.
File metadata
- Download URL: LAFITE-1.0.1.tar.gz
- Upload date:
- Size: 29.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f241fcfac7961bca2fe80ca955c2d9c6c08b4133f6b46106790be818b29f584
|
|
| MD5 |
27339fecd2c7bad5471e1f0811e79c96
|
|
| BLAKE2b-256 |
88f504984a4e9b7032f8f82d7a736f6980fe145f4121e917b0247e0e9165a048
|
File details
Details for the file LAFITE-1.0.1-py3-none-any.whl.
File metadata
- Download URL: LAFITE-1.0.1-py3-none-any.whl
- Upload date:
- Size: 28.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c359bb48338e39797f5a995d706ec21b7eee12c4f41f09b15febaf1c9831f974
|
|
| MD5 |
73835426583fae59c61c703ce2023928
|
|
| BLAKE2b-256 |
b5761ff3cae1b51ede47e46cab93be669101e439e9738219e34a5f5281a7a8b2
|