Skip to main content
Help us improve PyPI by participating in user testing. All experience levels needed!

SimLoRD is a read simulator for long reads from third generation sequencing and is currently focused on the Pacific Biosciences SMRT error model.

Project description

SimLoRD is a read simulator for third generation sequencing reads and is currently focused on the Pacific Biosciences SMRT error model.

Reads are simulated from both strands of a provided or randomly generated reference sequence.

Features

  • The reference can be read from a FASTA file or randomly generated with a given GC content. It can consist of several chromosomes, whose structure is respected when drawing reads. (Simulation of genome rearrangements may be incorporated at a later stage.)
  • The read lengths can be determined in four ways: drawing from a log-normal distribution (typical for genomic DNA), sampling from an existing FASTQ file (typical for RNA), sampling from a a text file with integers (RNA), or using a fixed length
  • Quality values and number of passes depend on fragment length.
  • Provided subread error probabilities are modified according to number of passes
  • Outputs reads in FASTQ format and alignments in SAM format

System requirements

We recommend using miniconda and creating an environment for SimLoRD

# Create and activate a new environment called simlord
conda create -n simlord python=3 pip numpy scipy cython
source activate simlord

# Install packages that are not available with conda from pip
pip install pysam
pip install dinopy
pip install simlord

# You now have a 'simlord' script; try it:
simlord --help

# To switch back to your normal environment, use
source deactivate

Platform support

SimLoRD is a pure Python program. This means that it runs on any operating system (OS) for which Python 3 and the other packages are available.

Example usage

Example 1: Simulate 10000 reads for the reference ref.fasta, use the default options for simulation and store the reads in myreads.fastq and the alignment in myreads.sam.

simlord  --read-reference ref.fasta -n 10000  myreads

Example 2: Generate a reference with 10 mio bases GC content 0.6 (i.e., probability 0.3 for both C and G; thus 0.2 probability for both A and T), store the reference as random.fasta, and simulate 10000 reads with default options, store reads as myreads.fastq, do not store alignments.

simlord --generate-reference 0.6 10000000 --save-reference random.fasta\
        -n 10000 --nosam  myreads

Example 3: Simulate reads from the given reference.fasta, using a fixed read length of 5000 and custom subread error probabilities (12% insertion, 12% deletion, 2% substitution). As before, save reads as myreads.fastq and myreads.sam.

simlord --read-reference reference.fasta  -n 10000 -fl 5000\
        -pi 0.12 -pd 0.12 -ps 0.02  myreads

A full list of parameters, as well as their documentation, can be found here.

License

SimLoRD is Open Source and licensed under the MIT License.

Project details


Release history Release notifications

This version
History Node

1.0.2

History Node

1.0.1

History Node

1.0.0

History Node

0.7.4

History Node

0.7.3

History Node

0.7.2

History Node

0.7.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
simlord-1.0.2.zip (18.5 kB) Copy SHA256 hash SHA256 Source None Mar 17, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page