Skip to main content

Amplicon read simualtor

Project description

Bygul: Amplicon Read Simulator

A tool for Amplicon read simulation for waste water sequencing or other aplications. Users can easily simulate reads from mutiple samples with different proportions using the tool.

Usage

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) https://github.com/andersen-lab/Bygul repository.

Installation

Bygul is written in python 3 but it requires wgsim and mason simulator to simulate reads.

Local build from source

git clone https://github.com/andersen-lab/Bygul
cd Bygul
pip install -e .

Please note that pip does not install all the requirements, some packages need to be installed via Conda or be built from source.

Installing via Conda

  1. pip install git+https://github.com/andersen-lab/Bygul
  2. Create a conda environment as bygul and install the dependencies:
conda create -n bygul
conda activate bygul
conda env update --file environment.yml

Example commands

Run the tool using the following command.

bygul simulate-proportions [SAMPLE1.fasta,SAMPLE2.fasta,..] [primer.bed] [reference.fasta] --proportions [0.8,0.2,..] --outdir [output_directory]

Simulate reads from different samples without defining proportions (will be assigned randomly, proportions can be found in results/sample_proportions.txt) and allowing upto 2 SNPs mistmatches in the primer regions.

bygul simulate-proportions sample.fasta,sample2.fasta primer.bed reference.fasta --outdir results/ --maxmismatch 2

Simulate reads with user-defined proportions and specifing read simulator. bygul uses wgsim as a simulator but you can change it to mason.

bygul simulate-proportions sample.fasta,sample2.fasta primer.bed reference.fasta --proportions 0.2,0.8 --simulator mason

Simulate reads with user-defined proportions and number of reads per amplicon.

bygul simulate-proportions sample.fasta,sample2.fasta primer.bed reference.fasta --proportions 0.2,0.8 --readcnt 1000

Simulate reads with additional parameters such as base error rate, read length and indels fraction

bygul simulate-proportions sample.fasta,sample2.fasta primer.bed reference.fasta --proportions 0.2,0.8 --readcnt 1000 --error_rate 0.001 --read_length 400 --indel_fraction 0.001

Notes

Number of reads per amplicon

It is recommended to define the number of reads per amplicon to be greater than the number of contigs in your amplicon file. This is particularly important when your primers are designed for whole genome sequencing, where each amplicon may contain a substantial number of contigs. Setting too few reads per amplicon may result in empty read files for certain amplicons, leading to incomplete simulated reads.

Primer bed file

Please remember that the primer file must contain a column containing primer sequence. The maximum number of mismatches allowed for each primer sequence is 1 SNP. To change this number, you may use the --maxmismatches flag.

Complete set of available parameters

To learn more about how to adjust other parameters use bygul simulate-proportions --help

Simulated reads output

Simulated reads from all samples are located in provided_output_path/reads.fastq

Information about amplicon dropouts

In order to find more about amplicon dropouts, please refer to provided_output_path/sample_name/amplicon_stats.csv file. This file will have right/left primer matching coordinates as zero if no matches found.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bygul-2025.4.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bygul-2025.4-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file bygul-2025.4.tar.gz.

File metadata

  • Download URL: bygul-2025.4.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bygul-2025.4.tar.gz
Algorithm Hash digest
SHA256 746c61bcc64c1ff0c1eeb06cfe267df0b66cd0db0c5be5c790a1518b99e7bad2
MD5 ff3b9a7669e7168f84e89e3e2e77682a
BLAKE2b-256 fd8aa1d39fe30594cfd17556e219eb4d63ea150a5abc8dbc45e2c04ee35781f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for bygul-2025.4.tar.gz:

Publisher: github_actions.yml on andersen-lab/Bygul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bygul-2025.4-py3-none-any.whl.

File metadata

  • Download URL: bygul-2025.4-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bygul-2025.4-py3-none-any.whl
Algorithm Hash digest
SHA256 600d88c522547fb83a4efb63d7da9422fd8cdb8fd0a85e21a1f79821b4d2b706
MD5 05aa9845e2b07112be5c12d12ab85ed8
BLAKE2b-256 d87f692587b5b8d12061cf23435232b3ef648cdebc496eefcddeacdb34357488

See more details on using hashes here.

Provenance

The following attestation bundles were made for bygul-2025.4-py3-none-any.whl:

Publisher: github_actions.yml on andersen-lab/Bygul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page