pywgsim
Project description
pywsgim
pywgsim is a python wrapper around the wgsim short read simulator
Usage
pywgsim -h
or
python -m pywgsim
Installation
pip install pywgsim
API
The interface to wgsim can be made in a single function call
from pywgsim import wgsim
wgsim.core(r1="r1.fq", r2="r2.fq", ref="genome.fa", err_rate=0.02, mut_rate=0.001, indel_frac=0.15, indel_ext=0.25, max_n=0.05, is_hap=0, N=100000, dist=500, stdev=50, size_l=100, size_r=100, is_fixed=0, seed=0)
Changes
The original code for wgsim has been expanded a little bit. The main changes are:
- The information on the mutations introduced by
wgsimare now generated in GFF format. - There is a new flag called
--fixedthat generates the sameNnumber of reads for each chromosome. - The separator character in the read name has been changed from
_to|. This follows a more widely accepted standard (i.e. NCBI) and allows identifying the contig name from the read name.
In the default operation of wgsim the N reads are distribute such to create a uniform coverage across all chromosomes (longer chromosomes get a larger fraction of N)
Mutation output
The output generated by pywgsim looks like this:
##gff-version 3
#
# N=1000 err_rate=0.02 mut_rate=0.001 indel_frac=0.15000001 indel_ext=0.25 size=500 std=50 len1=100 len2=100 seed=1606965870
#
NC_001416.1 wgsim snp 1047 1047 . + . Name=A/C;Ref=A;Alt=C;Type=hom
NC_001416.1 wgsim snp 1308 1308 . + . Name=C/Y;Ref=C;Alt=Y;Type=het
NC_001416.1 wgsim snp 1533 1533 . + . Name=G/T;Ref=G;Alt=T;Type=hom
NC_001416.1 wgsim snp 2472 2472 . + . Name=C/M;Ref=C;Alt=M;Type=het
NC_001416.1 wgsim snp 2964 2964 . + . Name=A/M;Ref=A;Alt=M;Type=het
NC_001416.1 wgsim snp 5375 5375 . + . Name=G/R;Ref=G;Alt=R;Type=het
New read names
The read names are now of the form:
@NC_002945.4|1768156|1768694|0:0:0|4:0:0|4
Where:
NC_002945.4is the contig name that the fragment was generated from.1768156is the left-most position of the fragment.1768694is the right-most position of the fragment.0:0:0are the number of errors, substitutions and indels in the left-most read of the pair.4:0:0are the number of errors, substitutions and indels in the right-most read of the pair.4is the read pair number, unique, per contig.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pywgsim-0.0.4.tar.gz
(36.8 kB
view details)
File details
Details for the file pywgsim-0.0.4.tar.gz.
File metadata
- Download URL: pywgsim-0.0.4.tar.gz
- Upload date:
- Size: 36.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.6.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0715ae17794271f9d97ed4415b8552b0066042df080d8c3d8b357d82a3fb8de
|
|
| MD5 |
3af84c37cc93afdb21fea941fe683757
|
|
| BLAKE2b-256 |
f2f08601ba0a1d48c4f7f8e0228a95585caeaf85683aa50ac4fa5c42e9439690
|