Skip to main content

a sequencing simulator

Project description

InSilicoSeq

A sequencing simulator

Build Status Documentation Status PyPI version codecov doi LICENSE

InSilicoSeq is a sequencing simulator producing realistic Illumina reads. Primarily intended for simulating metagenomic samples, it can also be used to produce sequencing data from a single genome.

InSilicoSeq is written in python, and use kernel density estimators to model the read quality of real sequencing data.

InSilicoSeq support substitution, insertion and deletion errors. If you don't have the use for insertion and deletion error a basic error model is provided.

Installation

Insilicoseq is Available in bioconda.

To install with conda:

conda install -c bioconda insilicoseq

Or with pip:

pip install InSilicoSeq

Note: Insilicoseq requires python >= 3.5

Alternatively, with docker:

docker pull quay.io/biocontainers/insilicoseq:2.0.0--pyh7cba7a3_0

For more installation options, please refer to the full documentation

Usage

InSilicoSeq has two subcommands: iss generate to generate Illumina reads and iss model to create an error model from which the reads will take their characteristics.

InSilicoSeq comes with pre-computed error models that should be sufficient for most use cases.

Generate reads with a pre-computed error model

for generating 1 million reads modelling a MiSeq instrument:

curl -O -J -L https://osf.io/thser/download  # download the example data
iss generate --genomes SRS121011.fasta --model miseq --output miseq_reads

where genomes.fasta should be replaced by a (multi-)fasta file containing the reference genome(s) from which the simulated reads will be generated.

InSilicoSeq comes with 3 error models: MiSeq, HiSeq and NovaSeq.

If you have built your own model, pass the .npz file to the --model argument to simulate reads from your own error model.

For 10 million reads and a custom error model:

curl -O -J -L https://osf.io/thser/download  # download the example data
iss generate -g SRS121011.fasta -n 10m --model my_model.npz --output /path/to/my_reads

granted you have built my_model.npz with iss model (see below)

For more examples and a full list of options, please refer to the full documentation

Generate reads without input genomes

We can download some for you! InSilicoSeq can download random genomes from the ncbi using the infamous eutils

The command

iss generate --ncbi bacteria -u 10 --model MiSeq --output ncbi_reads

will generate 1 million reads from 10 random bacterial genomes.

For more examples and a full list of options, please refer to the full documentation

Create your own error model

If you do not wish to use the pre-computed error models provided with InSilicoSeq, it is possible to create your own.

Say you have a reference metagenome called genomes.fasta, and read pairs reads_R1.fastq.gz and reads_R2.fastq.gz

Align you reads against the reference:

bowtie2-build genomes.fasta genomes
bowtie2 -x genomes -1 reads_R1.fastq.gz -2 reads_R2.fastq.gz | \
samtools view -bS | samtools sort -o genomes.bam
samtools index genomes.bam

then build the model:

iss model -b genomes.bam -o genomes

which will create a genome.npz file containing your newly built model

License

Code is under the MIT license.

Issues

Found a bug or have a question? Please open an issue

Contributing

We welcome contributions from the community! See our Contributing guidelines

Citation

If you use our software, please cite us!

Gourlé H, Karlsson-Lindsjö O, Hayer J and Bongcam+Rudloff E, Simulating Illumina data with InSilicoSeq. Bioinformatics (2018) doi:10.1093/bioinformatics/bty630

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

InSilicoSeq-2.0.1.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

InSilicoSeq-2.0.1-py3-none-any.whl (4.4 MB view details)

Uploaded Python 3

File details

Details for the file InSilicoSeq-2.0.1.tar.gz.

File metadata

  • Download URL: InSilicoSeq-2.0.1.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for InSilicoSeq-2.0.1.tar.gz
Algorithm Hash digest
SHA256 e7d6493f5fad4de059d355a77041d9b077c9edc13ec03552aeaa4f527b6f4e91
MD5 8f444854203d5456fa50f745692c58df
BLAKE2b-256 34f1efc069f047d2cccbbbecf8515c6054155eaa15438f0a427d80fb3f35f41d

See more details on using hashes here.

File details

Details for the file InSilicoSeq-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: InSilicoSeq-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for InSilicoSeq-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2874a412a935608f983d20bd8828eb7dccab08dc2823b93fff84b13d4210c816
MD5 168e3ba1de49ed1ffdeb90c8ae1fd73a
BLAKE2b-256 7312fee05d1c8060f774ea205bb2443d20b7cb620eacfd2c8a4680837256df7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page