Skip to main content

SimLoRD is a read simulator for long reads from third generation sequencing and is currently focused on the Pacific Biosciences SMRT error model.

Project description

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Reads are simulated from both strands of a provided or randomly generated reference sequence.

Features

  • The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)

  • The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length

  • Quality values and number of passes depend on fragment length.

  • Provided subread error probabilities are modified according to number of passes

  • Outputs reads in FASTQ format and alignments in SAM format

System requirements

We recommend using miniconda and creating an environment for SimLoRD

#!bash
# Create and activate a new environment called simlord
conda create -n simlord python=3 pip numpy scipy cython
source activate simlord

# Install packages that are not available with conda from pip
pip install pysam
pip install dinopy
pip install simlord

# You now have a 'simlord' script; try it:
simlord --help

# To switch back to your normal environment, use
source deactivate

Platform support

SimLoRD is a pure Python program. This means that it runs on any operating system (OS) for which Python 3 and the other packages are available.

Example usage

Example 1: Simulate 10000 reads for the reference ref.fasta, use the default options for simulation and store the reads in myreads.fastq and the alignment in myreads.sam.

#!python
simlord  --read-reference ref.fasta -n 10000  myreads

Example 2: Generate a reference with 10 mio bases GC content 0.6 (i.e., probability 0.3 for both C and G; thus 0.2 probability for both A and T), store the reference as random.fasta, and simulate 10000 reads with default options, store reads as myreads.fastq, do not store alignments.

#!python
simlord --generate-reference 0.6 10000000 --save-reference random.fasta\
        -n 10000 --nosam  myreads

Example 3: Simulate reads from the given reference.fasta, using a fixed read length of 5000 and custom subread error probabilities (12% insertion, 12% deletion, 2% substitution). As before, save reads as myreads.fastq and myreads.sam.

#!python

simlord --read-reference reference.fasta  -n 10000 -fl 5000\
        -pi 0.12 -pd 0.12 -ps 0.02  myreads

A full list of parameters, as well as their documentation, can be found under XXXX.

License

SimLoRD is Open Source and licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simlord-0.7.1.zip (16.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page