Skip to main content

Amplicon design tool for TB drug resistance

Project description

TOAST (Tuberculosis Optimized Amplicon Sequencing Tool)

We present TOAST, a software tool designed to streamline and optimize amplicon primer design for Mycobacterium tuberculosis sequencing. TOAST integrates the robust primer design capabilities of Primer3—accounting for Tm, homopolymers, hairpins, and homodimers—with an in-house pipeline that rigorously filters for heterodimer formation and unintended alternative binding. What sets TOAST apart is its automation and intelligence: it leverages a curated database of over 50 M. tuberculosis genomes to inform amplicon placement, ensuring robust primer performance across strain diversity. Users can prioritize SNPs for coverage, focus on specific resistance genes, and tailor designs to spoligotype backgrounds. The tool outputs primer sequences along with detailed thermodynamic profiles and genomic coordinates, making it an end-to-end solution for targeted TB panel design.


Key Functionalities

1. Amplicon Design (toast design)

Main Inputs:

  • SNP priority lists (-s): Default is globally collated clinical TB SNPs
  • Reference genome files (-ref): Default is MTB-h37rv genome
  • Spoligotype sequencing range files (-sp_f)
  • Optional user-defined primers (-ud) and custom genomic features (-gff)

Configurable Settings:

  • Amplicon size (-a)
  • Padding around target regions (-p): Default is amplicon size divided by 6
  • Number of specific amplicons (-sn) and targeted gene names (-sg)
  • Number of non-specific amplicons (-nn)
  • Graphical output option (-g) to visualize amplicon coverage
  • All SNPs in the reference genome (-all_snp) uses all frequent SNPs from the default database to design degenerate primers. For other species, a custom SNP file in the same format can be provided to override the default. Outputs:
  • Amplicon sequences and primer details organized into a user-specified output directory (-op)
  • Optional graphical representations of designed amplicons

Example Usage:

toast design -op ./output -a 400 -sn 2 -sg rpoB,katG -nn 20

This command designs amplicons of 400 base pairs, including two specifically targeting the rpoB and katG genes, and 20 additional amplicons for prioritized SNP coverage.


2. Amplicon Number Estimation (toast amplicon_no)

Purpose:
Estimate the number of amplicons required to achieve desired SNP coverage in TB genomic studies.

Main Inputs:

  • SNP priority files (-s)
  • Reference genome (-ref)

Settings:

  • Desired amplicon size (-a)
  • Target coverage depth
  • Optional graphical output for coverage estimates (-g)

Outputs:
Estimates and coverage graphics saved in the specified output directory (-op).


3. Visualization and Plotting (toast plotting)

Purpose:
Visualize and analyze the coverage and distribution of designed amplicons.

Main Inputs:

  • SNP priority files (-s)
  • Genomic feature files (GFF, -gff)
  • Primer sequences and reference designs

Settings:

  • Read size specifications

Outputs:
Visualization graphics, including coverage plots, available in the specified output directory (-op).


Quick Start

To view available command-line options and their defaults, use:

toast design -h
toast amplicon_no -h
toast plotting -h

Workflow

Before running the tool

*Decide on SNP priority by modifying the SNP priority file (mutation_priority_example.csv - can be found in github) *Decide on amplicon size

Dependencies

  • "pandas",
  • "numpy",
  • "plotly",
  • "rich_argparse",
  • "tabulate",
  • "primer3-py >= 2.0.1"
  • "python=3.11"
  1. Estimate amplicon number needed for coverage (amplicon_no function)
    • Example:
     toast amplicon_no -a 800 -op ./cache/Amplicon_design_output -g
    
  2. Run amplicon design (design function)
    • Example:
    toast design -op ./cache/Amplicon_design_output -a 400 -sn 1 -sg rpoB,katG -nn 40 
    
    toast design -op ./cache/Amplicon_design_output -a 400 -sn 1 -sg rpoB,katG -nn 25
    
    toast design -op ./cache/output -a 1000 -nn 4 -ud ./cache/test_df.csv
    
    toast design -op ./cache/output -a 1000 -nn 26
    
    toast design -op ./cache/Amplicon_design_output -a 400 -sn 1 -sg rpsL -nn 0 -ud ./cache/test_df.csv
    
  3. Check amplicon design using coverage plot (plotting function)
    • Example:
    toast plotting -ap ./toast/Amplicon_design_output/Primer_design-accepted_primers-23-400.csv -rp ./toast/db/reference_design.csv -op ./cache/Amplicon_design_output -r 400
    

Primer3 Configuration Parameters (default file: db/default_primer_design_setting.txt)

  • PRIMER_NUM_RETURN: Number of primer pairs to return.
  • PRIMER_PICK_INTERNAL_OLIGO: Flag to pick internal oligos (0 for no, 1 for yes).
  • PRIMER_INTERNAL_MAX_SELF_END: Maximum self-complementarity score for internal oligos.
  • PRIMER_MIN_SIZE: Minimum primer size in bases.
  • PRIMER_MAX_SIZE: Maximum primer size in bases.
  • PRIMER_MIN_TM: Minimum melting temperature (Tm) for primers in °C.
  • PRIMER_MAX_TM: Maximum melting temperature (Tm) for primers in °C.
  • PRIMER_MIN_GC: Minimum GC content in percent for primers.
  • PRIMER_MAX_GC: Maximum GC content in percent for primers.
  • PRIMER_MAX_POLY_X: Maximum length of mononucleotide repeats in primers.
  • PRIMER_INTERNAL_MAX_POLY_X: Maximum length of mononucleotide repeats in internal oligos.
  • PRIMER_SALT_MONOVALENT: Concentration of monovalent salts (e.g., Na+, K+) in mM.
  • PRIMER_DNA_CONC: Concentration of DNA template in nM.
  • PRIMER_MAX_NS_ACCEPTED: Maximum number of unknown bases (N's) accepted in primers.
  • PRIMER_MAX_SELF_ANY: Maximum overall self-complementarity score for primers.
  • PRIMER_MAX_SELF_END: Maximum 3' end self-complementarity score for primers.
  • PRIMER_PAIR_MAX_COMPL_ANY: Maximum overall complementarity score between primer pairs.
  • PRIMER_PAIR_MAX_COMPL_END: Maximum 3' end complementarity score between primer pairs.
  • PRIMER_PRODUCT_SIZE_RANGE: Range of acceptable primer product sizes (e.g., "100-300").

Example format of the user defined files can be found in user_defined_files/ folder:

  • Configuration Parameters file: default_primer_design_setting.txt
  • User input primer file: user_input_primer.csv

Segmented amplicon design is used to generate amplicons of varying sizes so they can be easily distinguished on an agarose gel. This provides a quick visual check to confirm successful amplification before sequencing, allowing users to validate that each target produced a distinct band. It helps avoid wasting sequencing resources on failed reactions and serves as a practical sanity check in the experimental workflow.

If issues arise during amplicon design—such as no primers being generated—it is most likely due to insufficient padding. A small padding size can restrict the available sequence context needed for effective primer placement. To resolve this, try increasing the padding value to provide more room for the design algorithm to work with.


Ouput file format

<filetype>-<number of total amplicon designed>-<minimum amplicon size>-<maximum amplicon size>-<step size>-<number of amplicon for each size>

5 different files are produced

  • Primer_design-accepted_primers: All detailed information about the designed amplicons
  • Amplicon_importance: Number of SNP coverd by each amplicon
  • Amplicon_mapped: bed file can be used to visualised the amplicon on genome using tools such as igv
  • SNP_inclusion: shows SNP covered
  • Gene_covereage: show percentage of each gene covered

Specific mutation (Mutation Priority)file format: Essentially all you need would be the genome position (genome_pos). Other columns are needed but you could used imputed values like below if unknown. The complete example Mutation priority csv can be found in Github:mutation_priority_example.csv

sample_id genome_pos gene change freq type sublin drtype drugs weight
sample_1 321168 gene_1 change_1 1 - - - - 1
sample_2 551767 gene_2 change_2 1 - - - - 1
sample_3 1017188 gene_3 change_3 1 - - - - 1
sample_4 1119158 gene_4 change_4 1 - - - - 1
sample_5 1119347 gene_5 change_5 1 - - - - 1
sample_6 1414872 gene_6 change_6 1 - - - - 1

You can manually eddit this for though a script (Github:mutation_priority_gen.py) can also be found to generate a file like the above:

example usage:

python mutation_priority_gen.py --positions "322168,553767,1077188" --output <output_path.csv>

More complete information with example files can be found in the Github repo: https://github.com/linfeng-wang/TOAST Docker image also available on DockerHub: https://hub.docker.com/repository/docker/linfengwang/toast-amplicon/general


REFERENCE: Wang, L., Thawong, N., Thorpe, J., Higgins, M., Ik, M.T.K., Sawaengdee, W., Mahasirimongkol, S., Perdigão, J., Campino, S., Clark, T.G. and Phelan, J.E. (2025e). TOAST: a novel tool for designing targeted gene amplicons and an optimised set of primers for high-throughput sequencing in tuberculosis genomic studies. BMC Genomics, 26(1). doi:https://doi.org/10.1186/s12864-025-12247-9.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toast_amplicon-1.5.7.tar.gz (26.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toast_amplicon-1.5.7-py2.py3-none-any.whl (18.8 MB view details)

Uploaded Python 2Python 3

File details

Details for the file toast_amplicon-1.5.7.tar.gz.

File metadata

  • Download URL: toast_amplicon-1.5.7.tar.gz
  • Upload date:
  • Size: 26.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for toast_amplicon-1.5.7.tar.gz
Algorithm Hash digest
SHA256 4899572794cc65a0a0b530a766bdb39e1e06be535d9de167d40e7acfc3264c23
MD5 ead16891943215343b724919f22d1935
BLAKE2b-256 505f23850c8c19c37fd364a07973e0577b3af364466cd18b7c4a8b26c8e2fb96

See more details on using hashes here.

File details

Details for the file toast_amplicon-1.5.7-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for toast_amplicon-1.5.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f599af509b3ff2da873456b857cd4dbbe3971f5e14948209a0b30ba5c6998923
MD5 b48f834a119fb85386b6d049fbf90496
BLAKE2b-256 67e1a1bd5b4ab6325f6d19b80afb308d2ee502fdb31c6d0a73f697adbebb385a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page