Core Sequence Identifier
Project description
CORSID
CORSID is a computational tool to simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We also provide another tool CORSID-A that only identifies TRS sites given annotated genes. Given an genome (optionally with their annotation), CORSID(-A) will find the TRS alignment and the core sequence.
The data and results can be found in the repo CORSID-data. The visualized results of our tool applied to 468 coronavirus genomes can be found in CORSID-viz.
Contents
- Pre-requisites
- Installation
- Using conda (recommended)
- Using pip (alternative)
- Usage instructions
Pre-requisites
- python3 (>=3.7)
- numpy
- pysam
- pandas
- pytablewriter
- (optional for simulation pipeline) snakemake (>=5.2.0)
Installation
Using conda (recommended)
-
Create a new conda environment named "corsid" and install dependencies:
conda create -n corsid
-
Then activate the created environment:
conda activate corsid
. -
Install the package into current environment "corsid":
conda install -c bioconda corsid
Using pip (alternative)
We recommend installing in a virtual environment, as decribed in step 1 and 2 in the previous section.
Use pip
to install the package:
pip install corsid
Usage instructions
I/O formats
CORSID takes a FASTA file containing the complete genome as input. Optionally it also takes an annotation file (GFF format) to validate the identified genes.
CORSID-A takes a FASTA file and an annotation file (GFF format) as input. It will find candidate regions for each gene given the annotation file, and run CORSID-A on candidate regions.
The output is an JSON file containing sorted solutions and auxilary information. This file can be used as the input to the visualization webapp (link). The program also outputs to the standard output, where it shows tables of solutions and visualization of TRS alignment.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.