Splice aligner of long transcriptomic reads to genome.
Project description
uLTRA
uLTRA is a tool for splice alignment of long transcriptomic reads to a genome, guided by a database of exon annotations. uLTRA is particularly accurate when aligning to small exons see some examples.
uLTRA is distributed as a python package supported on Linux / OSX with python (versions 3.4 or above).
Table of Contents
INSTALLATION
Conda recipe
There is a bioconda recipe, docker image, and a singularity container of uLTRA v0.0.4 created by sguizard. You can use, e.g., the bioconda recipe for an easy automated installation.
If a newer version of uLTRA is not available through bioconda (or you simply want more control of/customize your installation), alternative ways of installations are provided below. Current version of uLTRA is 0.1 (see changelog at end of this readme).
Using the INSTALL.sh script
You can clone this repository and
run the script INSTALL.sh
as
git clone https://github.com/ksahlin/uLTRA.git --depth 1
cd uLTRA
./INSTALL.sh <install_directory>
The install script is tested in bash environment.
To run uLTRA, you need to activate the conda environment "ultra":
conda activate ultra
Without the INSTALL.sh script
You can also manually perform below steps for more control.
1. Create conda environment
Create a conda environment called ultra and activate it
conda create -n ultra python=3 pip
conda activate ultra
2. Install uLTRA
pip install ultra-bioinformatics
3. Install third party tools
Install namfinder and minimap2 and
place the generated binaries namfinder
and minimap2
in your path.
4. Verify installation
You should now have 'uLTRA' installed; try it
uLTRA --help
Upon start/login to your server/computer you need to activate the conda environment "ultra" to run uLTRA as:
conda activate ultra
You can also download and use test data available in this repository here and run:
uLTRA pipeline [/your/full/path/to/test]/SIRV_genes.fasta \
/your/full/path/to/test/SIRV_genes_C_170612a.gtf \
[/your/full/path/to/test]/reads.fa outfolder/ [optional parameters]
Entirly from source
Make sure the below-listed dependencies are installed (installation links below). All below dependencies except namfinder
can be installed as pip install X
or through conda.
With these dependencies installed. Run
git clone https://github.com/ksahlin/uLTRA.git
cd uLTRA
./uLTRA
USAGE
uLTRA can be used with either PacBio Iso-Seq or ONT cDNA/dRNA reads.
Indexing
uLTRA index genome.fasta /full/path/to/annotation.gtf outfolder/ [parameters]
Important parameters:
--disable_infer
can speed up the indexing considerably, but it only works if you have thegene feature
andtranscript feature
in your GTF file.
Aligning
For example
uLTRA align genome.fasta reads.[fa/fq] outfolder/ --ont --t 8 # ONT cDNA reads using 8 cores
uLTRA align genome.fasta reads.[fa/fq] outfolder/ --isoseq --t 8 # PacBio isoseq reads
Important parameters:
--index [PATH]
: You can set a custom location of where to get the index from using, otherwise, uLTRA will try to read the index from theoutfolder/
by default.--prefix [PREFIX OF FILE]
: The aligned reads will be written tooutfolder/reads.sam
unless--prefix
is set. For example,--prefix sample_X
will output the reads inoutfolder/sample_X.sam
.
Pipeline
Perform all the steps in one
uLTRA pipeline genome.fasta /full/path/to/annotation.gtf reads.fa outfolder/ [parameters]
Common errors
Not having a properly formatted GTF file. Before running uLTRA, notice that it reqires a properly formatted GTF file. If you have a GFF file or other annotation format, it is adviced to use AGAT for file conversion to GTF as many other conversion tools do not respect GTF format. For example, you can run AGAT as:
agat_convert_sp_gff2gtf.pl --gff annot.gff3 --gtf annot.gtf
CREDITS
Please cite
- Kristoffer Sahlin, Veli Mäkinen, Accurate spliced alignment of long RNA sequencing reads, Bioinformatics, Volume 37, Issue 24, 15 December 2021, Pages 4643–4651, https://doi.org/10.1093/bioinformatics/btab540
when using uLTRA. Please also cite minimap2 as uLTRA incorporates minimap2 for alignment of some genomic reads outside indexed regions. For example "We aligned reads to the genome using uLTRA [1], which incorporates minimap2 [CIT].".
LICENCE
GPL v3.0, see LICENSE.txt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ultra_bioinformatics-0.1.tar.gz
.
File metadata
- Download URL: ultra_bioinformatics-0.1.tar.gz
- Upload date:
- Size: 54.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c477b526cba52fcaf369efb67f0d0096ea1702d7adc8457746a91ae5f750a0ef |
|
MD5 | ae0b4f5508701c965f48b6c78804607c |
|
BLAKE2b-256 | e23afe1e24c89ea48d45c10418466527979d68bd6383c749ea4dc33e0fab18dd |