Skip to main content

Amplicon read simualtor

Project description

Bygul: Amplicon & Metagenomics Read Simulator

Bygul is a Python 3 tool designed for simulating sequencing reads in wastewater surveillance and other metagenomic applications. It allows users to simulate complex multi-sample datasets with customizable proportions using industry-standard backends like wgsim and mason.


🏗 Installation

Bygul requires Python 3. Since it relies on external simulators (wgsim and mason), we recommend using Conda to manage dependencies.For more info on wgsim and mason simulator please check their documentations.

Option 1: Via Conda (Recommended)

conda create -n bygul bioconda::bygul

Option 2: Via PyPI

pip install bygul

Note: Some binary dependencies (wgsim/mason) may need to be installed manually or built from source if using this method.

Option 3: Local Build from Source

git clone [https://github.com/andersen-lab/Bygul](https://github.com/andersen-lab/Bygul)
cd Bygul
pip install -e .

🧬 Usage: Amplicon Sequencing Mode

Use this mode when simulating specific genomic regions defined by a primer set.

Basic Command

bygul simulate-proportions [SAMPLE1.fasta,SAMPLE2.fasta] --primers [primer.bed] --reference [reference.fasta] --proportions [0.8,0.2] --outdir [output_dir]

Advanced Examples

  • Random Proportions & Mismatches: Simulate with random proportions and allow up to 2 SNPs in primer regions.
    bygul simulate-proportions sample1.fasta,sample2.fasta --primers primer.bed --reference reference.fasta --outdir results/ --maxmismatch 2
    
  • Switching Simulators: Use mason instead of the default wgsim.
    bygul simulate-proportions sample1.fasta,sample2.fasta --primers primer.bed --reference reference.fasta --simulator mason
    
  • Custom Error Rates & Lengths: Pass simulator-specific parameters (e.g. indel fraction -R) directly.
    bygul simulate-proportions sample1.fasta,sample2.fasta --primers primer.bed --reference reference.fasta -R 0.01
    

🌍 Usage: Metagenomics Mode

Simulate reads from entire samples without requiring a primer BED file or a reference sequence.

Basic Metagenomics Simulation

bygul simulate-proportions sample1.fasta,sample2.fasta --outdir results/ --simulation_mode metagenomics

Metagenomics with Specific Parameters

bygul simulate-proportions sample1.fasta,sample2.fasta --proportions 0.5,0.5 --outdir results/ --simulation_mode metagenomics --simulator mason --illumina-read-length 200

📝 Technical Notes

Parameter Handling

Bygul acts as a wrapper. While most flags are passed directly to the underlying simulators, the following are managed directly by Bygul for more realistic simulations(amplicon simulation mode only):

  • --readcnt: Number of reads per amplicon.
  • --wgsim_insert_size: Insert size for wgsim.
  • --wgsim_read_length / --wgsim_error_rate.

To see all available backend flags, run:

wgsim --help
mason_simulator --help

Best Practices

  • Read Counts: Set --readcnt higher than the number of contigs in your amplicon file. Too few reads can result in empty files for certain amplicons.
  • Primer Files: The BED file must include a column with the primer sequence. Bygul allows 1 SNP mismatch by default; use --maxmismatch to change this.

Output Files

  • Consolidated Reads: Simulated reads from all samples are at outdir/reads.fastq.
  • Proportions: Assigned proportions are recorded in results/sample_proportions.txt.
  • Quality Metrics: Check outdir/[sample_name]/amplicon_stats.csv for information on amplicon dropouts, mismatches, and ambiguous bases.

🎓 Citation

If you use this workflow in a paper, please cite the original repository: https://github.com/andersen-lab/Bygul

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bygul-3.0.1.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bygul-3.0.1-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file bygul-3.0.1.tar.gz.

File metadata

  • Download URL: bygul-3.0.1.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bygul-3.0.1.tar.gz
Algorithm Hash digest
SHA256 a84ae9972a2703f09fd12ffd3ac155b41b11b5c6a21096c447b1f346c1ced638
MD5 1de503dee4549d8ce729360b09168803
BLAKE2b-256 a6e7a77750038fd170e3bff676b9b6d1e4bb72ab66eec55b5560ecc644a199bb

See more details on using hashes here.

Provenance

The following attestation bundles were made for bygul-3.0.1.tar.gz:

Publisher: github_actions.yml on andersen-lab/Bygul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bygul-3.0.1-py3-none-any.whl.

File metadata

  • Download URL: bygul-3.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bygul-3.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5abb2ae0eb7b126d14cade4e01cafbaeb526c91f3db53a5cd098f73a0bd9a3f0
MD5 219560f8398593fb3fa2760941bf0734
BLAKE2b-256 d80359e37bd0177f64c2f2c9c9853b2a8c5dd3416a71739c6e876f815d7df8a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for bygul-3.0.1-py3-none-any.whl:

Publisher: github_actions.yml on andersen-lab/Bygul

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page