Skip to main content

PARRIS: Profiling and Annotating ciRcular RNA with Iso-Seq

Project description

PARRIS: Profiling and Annotating ciRcular RNA with Iso-Seq

Getting started

Installation

PARRIS is written with python2, please use pip to install PARRIS:

pip install parris

Or, you can follow the instructions to install PARRIS from source:

git clone https://github.com/yangao07/PARRIS.git
cd PARRIS
python setup.py install          # install main package
pip install -r requirements.txt  # install dependencies

Also, please make sure samtools(>=v1.6) and bedtools(>=v2.26.0) are installed in your system.

Running PARRIS

Example command #1:

parris -t 8 long_circRNA.fa reference.fa gene_anno.gtf circRNA.bed output_folder

Example command #2:

parris -t 8 long_circRNA.fa reference.fa gene_anno.gtf circRNA.bed output_folder \
    --short-read short_read.fa \
    --Alu ./anno/hg19/alu.bed   \
    --all-repeat ./anno/hg19/all_repeat.bed

Detailed arguments:

parris -h
usage: parris [-h] [-v] [-t THREADS] [--short-read short.fa] [--lordec LORDEC]
              [--kmer KMER] [--solid SOLID] [--trf TRF] [--match MATCH]
              [--mismatch MISMATCH] [--indel INDEL] [--match-frac MATCH_FRAC]
              [--indel-frac INDEL_FRAC] [--min-score MIN_SCORE]
              [--max-period MAX_PERIOD] [--fxtools FXTOOLS]
              [--min-len MIN_LEN] [--min-copy MIN_COPY] [--min-frac MIN_FRAC]
              [--minimap MINIMAP] [-f] [--high-max-ratio HIGH_MAX_RATIO]
              [--high-min-ratio HIGH_MIN_RATIO]
              [--high-iden-ratio HIGH_IDEN_RATIO]
              [--high-repeat-ratio HIGH_REPEAT_RATIO]
              [--low-repeat-ratio LOW_REPEAT_RATIO] [--Alu ALU]
              [--flank-len FLANK_LEN] [--all-repeat ALL_REPEAT] [-s SITE_DIS]
              [-S END_DIS]
              long.fa ref.fa anno.gtf circRNA.bed/gtf output

PARRIS: Profiling and Annotating ciRcular RNA with Iso-Seq

positional arguments:
  long.fa               Long read data generated from long-read circRNA
                        sequencing technique.
  ref.fa                Reference genome sequence file.
  anno.gtf              Whole gene annotation file in GTF format.
  circRNA.bed/gtf       circRNA annotation file in BED12 or GTF format.
  output                Output directory for final result and temporary files.

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit

General options:
  -t THREADS, --threads THREADS
                        Number of thread to use. (default: 8)

Hybrid error-correction with short-read data (LoRDEC):
  --short-read short.fa
                        Short-read data for error correction. Use ',' to
                        connect multiple or paired-end short read data.
                        (default: )
  --lordec LORDEC       Path to lordec-correct. (default: lordec-correct)
  --kmer KMER           k-mer size. (default: 21)
  --solid SOLID         Solid k-mer abundance threshold. (default: 3)

Detecting tandem-repeat with TRF(Tandem Repeat Finder):
  --trf TRF             Path to trf program. (default: trf409.legacylinux64)
  --match MATCH         Match score. (default: 2)
  --mismatch MISMATCH   Mismatch penalty. (default: 7)
  --indel INDEL         Indel penalty. (default: 7)
  --match-frac MATCH_FRAC
                        Match probability. (default: 80)
  --indel-frac INDEL_FRAC
                        Indel probability. (default: 10)
  --min-score MIN_SCORE
                        Minimum alignment score to report. (default: 100)
  --max-period MAX_PERIOD
                        Maximum period size to report. (default: 2000)

Extracting and aligning consensus sequence to genome (minimap2):
  --fxtools FXTOOLS     Path to fxtools. (default: fxtools)
  --min-len MIN_LEN     Minimum consensus length to keep. (default: 30)
  --min-copy MIN_COPY   Minimum copy number of consensus to keep. (default:
                        2.0)
  --min-frac MIN_FRAC   Minimum fraction of original long read to keep.
                        (default: 0.0)
  --minimap MINIMAP     Path to minimap2. (default: minimap2)
  -f, --do-classify     Classify circRNA alignment into high-quality and low-
                        quality. (default: False)
  --high-max-ratio HIGH_MAX_RATIO
                        Maximum mappedLen / consLen ratio for high-quality
                        alignment. (default: 1.1)
  --high-min-ratio HIGH_MIN_RATIO
                        Minimum mappedLen /consLen ratio for high-quality
                        alignment. (default: 0.9)
  --high-iden-ratio HIGH_IDEN_RATIO
                        Minimum identicalBases/ consLen ratio for high-quality
                        alignment. (default: 0.75)
  --high-repeat-ratio HIGH_REPEAT_RATIO
                        Maximum mappedLen / consLen ratio for high-quality
                        self-tandem consensus. (default: 0.6)
  --low-repeat-ratio LOW_REPEAT_RATIO
                        Minimum mappedLen / consLen ratio for low-quality
                        self-tandem alignment. (default: 1.9)

Evaluating circRNA with annotation:
  --Alu ALU             Alu repetitive element annotation in BED format.
                        (default: )
  --flank-len FLANK_LEN
                        Length of upstream and downstream flanking sequence to
                        search for Alu. (default: 500)
  --all-repeat ALL_REPEAT
                        All repetitive element annotation in BED format.
                        (default: )
  -s SITE_DIS, --site-dis SITE_DIS
                        Allowed distance between circRNA internal-splice-site
                        and annoated splice-site. (default: 0)
  -S END_DIS, --end-dis END_DIS
                        Allowed distance between circRNA back-splice-site and
                        annoated splice-site. (default: 10)

Changelog (v1.5.9)

  1. Fix bug in the searching for known splice-junction.
  2. Use known and canonical internal splice strand information to guide the searching for back-splice-junction.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for parris, version 1.5.10
Filename, size File type Python version Upload date Hashes
Filename, size parris-1.5.10-py2-none-any.whl (25.8 MB) File type Wheel Python version py2 Upload date Hashes View hashes
Filename, size parris-1.5.10.tar.gz (25.6 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page