Skip to main content

Core Sequence Identifier

Project description

CORSID

CORSID is a computational tool to simultaneously identify TRS sites, the core sequence and gene locations given an unannotated coronavirus genome sequence. We also provide another tool, CORSID-A, which identifies TRS sites and the core sequence given a coronavirus genome sequence with annotated gene locations.

The data and results can be found in the repo CORSID-data. The visualized results of our tool applied to 468 coronavirus genomes can be found in CORSID-viz.

Figure

Contents

  1. Pre-requisites
  2. Installation
  3. Usage instructions

Pre-requisites

Installation

Using conda (recommended)

  1. Create a new conda environment named "corsid" and install dependencies:

    conda create -n corsid python=3.7
    
  2. Then activate the created environment: conda activate corsid.

  3. Install the package into current environment "corsid":

    conda install -c bioconda corsid
    

Using pip (alternative)

We recommend installing in a virtual environment, as decribed in steps 1 and 2 in the previous section. Use pip to install the package:

pip install corsid

Usage instructions

I/O formats

CORSID takes a FASTA file containing the complete genome as input. Optionally it also takes an annotation file (GFF format) to validate the identified genes.

CORSID-A takes a FASTA file and an annotation file (GFF format) as input. It will find candidate regions for each gene given the annotation file, and run CORSID-A on candidate regions.

The output is an JSON file containing sorted solutions and auxilary information. This file can be used as the input to the visualization webapp. The program also outputs to the standard output, where it shows tables of solutions and visualization of TRS alignment.

Example

After installation, you can check if the program runs correctly by analyzing the SARS-CoV-2 genome (NC_045512) as follows:

git clone git@github.com:elkebir-group/CORSID.git
cd CORSID
corsid -f test/NC_045512.fasta -o test/NC_045512.json > test/NC_045512.txt

You can find a list of solutions displayed as tables in test/NC_045512.txt. The best solution should be the same as the figure below: Expected result

You can also use option -g test/NC_045512.gff to validate the identified genes.

corsid -f test/NC_045512.fasta -g test/NC_045512.gff \
    -o test/NC_045512.json > test/NC_045512.txt

The result will look like: Expected result

Similarly, you can also run CORSID-A with command:

corsid_a -f test/NC_045512.fasta -g test/NC_045512.gff \
    -o test/NC_045512.corsid_a.json > test/NC_045512.corsid_a.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corsid-0.1.1.tar.gz (15.4 kB view hashes)

Uploaded Source

Built Distribution

corsid-0.1.1-py3-none-any.whl (17.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page