a sequencing simulator
Project description
InSilicoSeq
A sequencing simulator
InSilicoSeq is a sequencing simulator producing realistic Illumina reads. Primarily intended for simulating metagenomic samples, it can also be used to produce sequencing data from a single genome.
InSilicoSeq is written in python, and use kernel density estimators to model the read quality of real sequencing data.
InSilicoSeq support substitution, insertion and deletion errors. If you don't have the use for insertion and deletion error a basic error model is provided.
Installation
Insilicoseq is Available in bioconda.
To install with conda:
conda install -c bioconda insilicoseq
Or with pip:
pip install InSilicoSeq
Note: Insilicoseq requires python >= 3.5
Alternatively, with docker:
docker pull quay.io/biocontainers/insilicoseq:2.0.0--pyh7cba7a3_0
For more installation options, please refer to the full documentation
Usage
InSilicoSeq has two subcommands: iss generate
to generate Illumina reads and iss model
to create an error model from which the reads will take their characteristics.
InSilicoSeq comes with pre-computed error models that should be sufficient for most use cases.
Generate reads with a pre-computed error model
for generating 1 million reads modelling a MiSeq instrument:
curl -O -J -L https://osf.io/thser/download # download the example data
iss generate --genomes SRS121011.fasta --model miseq --output miseq_reads
where genomes.fasta
should be replaced by a (multi-)fasta file containing the reference genome(s) from which the simulated reads will be generated.
InSilicoSeq comes with 3 error models: MiSeq
, HiSeq
and NovaSeq
.
If you have built your own model, pass the .npz
file to the --model
argument to simulate reads from your own error model.
For 10 million reads and a custom error model:
curl -O -J -L https://osf.io/thser/download # download the example data
iss generate -g SRS121011.fasta -n 10m --model my_model.npz --output /path/to/my_reads
granted you have built my_model.npz
with iss model
(see below)
For more examples and a full list of options, please refer to the full documentation
Generate reads without input genomes
We can download some for you! InSilicoSeq can download random genomes from the ncbi using the infamous eutils
The command
iss generate --ncbi bacteria -u 10 --model MiSeq --output ncbi_reads
will generate 1 million reads from 10 random bacterial genomes.
For more examples and a full list of options, please refer to the full documentation
Create your own error model
If you do not wish to use the pre-computed error models provided with InSilicoSeq, it is possible to create your own.
Say you have a reference metagenome called genomes.fasta
, and read pairs reads_R1.fastq.gz
and reads_R2.fastq.gz
Align you reads against the reference:
bowtie2-build genomes.fasta genomes
bowtie2 -x genomes -1 reads_R1.fastq.gz -2 reads_R2.fastq.gz | \
samtools view -bS | samtools sort -o genomes.bam
samtools index genomes.bam
then build the model:
iss model -b genomes.bam -o genomes
which will create a genome.npz
file containing your newly built model
License
Code is under the MIT license.
Issues
Found a bug or have a question? Please open an issue
Contributing
We welcome contributions from the community! See our Contributing guidelines
Citation
If you use our software, please cite us!
Gourlé H, Karlsson-Lindsjö O, Hayer J and Bongcam+Rudloff E, Simulating Illumina data with InSilicoSeq. Bioinformatics (2018) doi:10.1093/bioinformatics/bty630
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file InSilicoSeq-2.0.1.tar.gz
.
File metadata
- Download URL: InSilicoSeq-2.0.1.tar.gz
- Upload date:
- Size: 4.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e7d6493f5fad4de059d355a77041d9b077c9edc13ec03552aeaa4f527b6f4e91 |
|
MD5 | 8f444854203d5456fa50f745692c58df |
|
BLAKE2b-256 | 34f1efc069f047d2cccbbbecf8515c6054155eaa15438f0a427d80fb3f35f41d |
File details
Details for the file InSilicoSeq-2.0.1-py3-none-any.whl
.
File metadata
- Download URL: InSilicoSeq-2.0.1-py3-none-any.whl
- Upload date:
- Size: 4.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2874a412a935608f983d20bd8828eb7dccab08dc2823b93fff84b13d4210c816 |
|
MD5 | 168e3ba1de49ed1ffdeb90c8ae1fd73a |
|
BLAKE2b-256 | 7312fee05d1c8060f774ea205bb2443d20b7cb620eacfd2c8a4680837256df7d |