Skip to main content

QuantMeta determines the absolute abundance of targets in metagenomes spiked with synthetic DNA standards. The tool accurately quantifies targets by establishing (1) entropy-based detection thresholds to confidently determine the presence of targets, and (2) an approach to identify and correct read mapping or assembly errors.

Project description

QuantMeta

QuantMeta determines the absolute abundance of targets in metagenomes spiked with synthetic DNA standards. The pipeline incorporates (1) detection thresholds (acting similarly to limit of detection) to determine presence or absence of targets, (2) identification and correction of read mapping errors that affect accuracy of quantification, and (3) determination of absolute abundance of targets.

See more details in Langenfeld et al. (2025). The pipeline was developed using a set of 86 dsDNA standards developed by Hardwick et al. (2018) complimented with a set of 5 ssDNA standards. However, the pipeline allows the user to define spike-in standards for applicability to both viral and entire microbial community metagenomes.

Please cite our work:
Langenfeld, K., Hegarty, B., Vidaurri, S., Crossette, E., Duhaime, M. B., & Wigginton, K. R. (2025). Development of a quantitative metagenomic approach to establish quantitative limits and its application to viruses. Nucleic acids research, 53(5), gkaf118.

Installation

pip install git+https://github.com/klangenf/QuantMeta.git

Note: Requires python >=3.14

Dependencies

  • samtools:conda install -c bioconda samtools
  • bioawk:conda install -c bioconda bioawk
  • (if generating detection threshold) bowtie2:conda install -c bioconda bowtie2
  • (if generating detection threshold) seqtk: conda install -c bioconda seqtk

Run Instructions

Optional Pre-Step: Generate Detection Threshold

This optional step develops a regression to determine E_detect (minimum E_rel) for confident detection with respect to a target's length. The regression from Langenfeld et al. (2025) may be adopted, but the minimum E_rel is based on the standards proposed by FastViromeExplorer (Lithi et al. 2018) (10% read coverage and 0.3 observed/expected read distribution). If these parameters are deemed inappropriate for specific applications, it is recommended that new E_detect regressions be created for specific research needs.

Adjust the minimum coverage (min_coverage) and minimum read distribution (min_distribution) parameters to fit your needs.

Usage:

detection-threshold -c FILE -fq DIR -std FILE -test FILE -tb DIR -tn NAME -min_cov FLOAT -min_dist FLOAT -o DIR -j N

Options:

-c, --config FILE
Config file of samples (default: None, required)
-fq, --fastq-dir DIR
Directory containing deinterleaved fastq files named as {sample_name}_R1.fastq.gz and {sample_name}_R2.fastq.gz (default: Reads/)
-std, --standards FILE
Fasta file with standard sequences (default: Langenfeld_2025_standards.fasta)
-mix, --dsDNA-std-file FILE
Table of dsDNA standards (ID, Mass, Rel_Abund, length) (default: sequins_Mix_A.txt)
-ssmix, --ssDNA-std-file FILE
Optional table of ssDNA standards (ID, Mass, Rel_Abund, length) (default: none, optional for Langenfeld et al. 2025 ssDNA standards specify ssDNA_stds.txt)
-test, --test-database FILE
Fasta file with test sequences (default: None, required for regression builder)
-tb, --test-bam-dir DIR
Directory containing sorted bam files of mapping reads to test database named as {sample_name}_{test_name}_sorted.bam (default: Mapping/)
-tn, --test-name NAME
Name for test database (default: None, required, used for naming output files)
-min_cov, --min-coverage FLOAT
Minimum read coverage threshold for detection (default: 0.1)
-min_dist, --min-distribution FLOAT
Minimum read distribution threshold for detection (default: 0.3)
-o, --output-dir DIR
Directory for output files (default: QuantMeta/)
-j, --cores N
Number of cores (default: 4)
-h, --help
Show this help message

Step 1: Generate Standard Curves and Read Mapping Error Assessment/Correction Regressions

This step contains two parts: (1) generating standard curves relating relative abundance to absolute abundance for each sample, and (2) generating regressions for read mapping error assessment and correction. The first step must be performed for each project. However, for the second part, users are recommended to generate their own project-specific regressions, but have the option to use the regressions generated by Langenfeld et al. (2025) for Step 2.

Detecting and correcting quantification errors caused by non-specific mapping or assembly errors requires limitations on the acceptable read depth variability across target sequences. The acceptable read depth variability may differ depending on the library preparation, if PCR amplification was performed, and the sequencing technology used as each may introduce different bias and increase or decrease how much read depth may be an intrinsic result of sequencing. Langenfeld et al. (2025) used Swift 1S Plus library prep for simultaneous sequencing of dsDNA and ssDNA with Illumina NovaSeq on SP flowcells to produce 251-bp paired-end reads. It is recommended that specific read depth variability thresholds are developed for each sequencing protocol.

There are several variables that may need to be manually adjusted in the read depthvariability regressions. Ideal read depth variability regressions will have (1) minimize RMSE outliers in each regression and (2) alter regressions so the Q-Q plot follows the 1:1 line reasonably closely (will prevent over or under fitting the data).

In Langenfeld et al. (2025), the standards were binned based on the average read depth across a target sequence in 0-10, 10-100, 100-1000, and >1000 reads/bp. These bins may need to be adjusted depending on the sequencing depth and sample diversity.

The terms in each equation may vary slightly. Based on Browne et al. (2020), Illumina technologies introduce as quadratic GC bias in sequencing results. Additionally, the total average read depth and E_rel may also significantly impact the observed read depth variability. In Langenfeld et al. (2025), E_rel was only a significant factor for the least abundant standards with incomplete coverage (i.e., 0-10 reads/bp bin).

Usage:

standard-curve -c FILE -b DIR -std FILE -mix FILE -spike FILE -detect FILE -w N -o DIR -j N

Options:

-c, --config FILE
Config file of samples (default: None, required)
-b, --bam-dir DIR
Directory containing sorted bam files of mapping reads to standards named as {sample_name}_standards_sorted.bam (default: Mapping/)
-std, --standards FILE
Fasta file with standard sequences (default: Langenfeld_2025_standards.fasta)
-mix, --dsDNA-std-file FILE
Table of dsDNA standards (ID, Mass, Rel_Abund, length) (default: sequins_Mix_A.txt)
-ssmix, --ssDNA-std-file FILE
Optional table of ssDNA standards (ID, Mass, Rel_Abund, length) (default: None, optional for Langenfeld et al. 2025 ssDNA standards specify ssDNA_stds.txt)
-spike, --spike-in-info FILE
Table of sample-specific spike-in information (Sample, Library_Mass (ng), DNA_Extract_Conc (ng/µL), Spike_Frac, ssDNA (0/Spike_Frac) (default: None, required)
-detect, --detection-threshold FILE
Detection threshold json file (default: detection/Langenfeld_2025_E_detect.json)
-w, --window-size N
Window size for sliding window analysis (default: 49)
-o, --output-dir DIR
Directory for output files (default: QuantMeta/)
-j, --cores N
Number of cores (default: 4)
-h, --help
Show this help message

Step 2: Quantify Targets Genes or Genomes

This step quantifies target sequences by:
1. Assessing detection thresholds for standards
2. Correcting for non-specific mapping or assembly error artifacts
3. Converting relative abundances to absolute concentrations in DNA extracts (copies/µL DNA extract)

Usage:

quant-targets -c FILE -s FILE -T FILE -Tb DIR -N NAME -w N -detect FILE -rdv1 FILE -rdv2 FILE -rdv3 FILE -rdv4 FILE -rmse1 FILE -rmse2 FILE -rmse3 FILE -rmse4 FILE -o DIR -j N

Options:

-c, --config FILE
Config file of samples (default: None, required)
-s, --spike-info FILE
Table of spike-in information (Sample, Library Mass, DNA Concentration, Spike-in Fraction, ssDNA Fraction) (default: None, required)
-T, --targets FILE
Fasta file with target sequences (default: None, required)
-Tb, --targets-bam-dir DIR
Directory containing sorted bam files of mapping reads to target database named as {sample_name}_{target_name}_sorted.bam (default: Mapping/)
-N, --target-name NAME
Name for target database (default: None, required, used for naming output files)
-w, --window-size N
Window size for sliding window analysis, must be the same as used in standard_curve_generator.sh (default: 49)
-detect, --detection-threshold FILE
Detection threshold json file (default: detection/Langenfeld_2025_E_detect.json)
-rdv1, --read-depth-variability-model1 FILE
Read depth variability model for 0-10 reads/bp (recommend: using output from Step 1 in Regressions/read_depth_variability/reg1.pkl; default: read_depth_variability/Langenfeld_2025_0to10readsperbp.pkl)
-rdv2, --read-depth-variability-model2 FILE
Read depth variability model for 10-100 reads/bp (recommend: using output from Step 1 in Regressions/read_depth_variability/reg2.pkl; default: read_depth_variability/Langenfeld_2025_10to100readsperbp.pkl)
-rdv3, --read-depth-variability-model3 FILE
Read depth variability model for 100-1000 reads/bp (recommend: using output from Step 1 in Regressions/read_depth_variability/reg3.pkl; default: read_depth_variability/Langenfeld_2025_100to1000readsperbp.pkl)
-rdv4, --read-depth-variability-model4 FILE
Read depth variability model for >1000 reads/bp (recommend: using output from Step 1 in Regressions/read_depth_variability/reg4.pkl; default: read_depth_variability/Langenfeld_2025_gte1000readsperbp.pkl)
-rmse1, --rmse-cutoff-function1 FILE
RMSE threshold function model for 0-10 reads/bp (recommend: using output from Step 1 in Regressions/threshold_read_depth_variability/func1.pkl; default: threshold_read_depth_variability/Langenfeld_2025_0to10readsperbp.pkl)
-rmse2, --rmse-cutoff-function2 FILE
RMSE threshold function model for 10-100 reads/bp (recommend: using output from Step 1 in Regressions/threshold_read_depth_variability/func2.pkl; default: threshold_read_depth_variability/Langenfeld_2025_10to100readsperbp.pkl)
-rmse3, --rmse-cutoff-function3 FILE
RMSE threshold function model for 100-1000 reads/bp (recommend: using output from Step 1 in Regressions/threshold_read_depth_variability/func3.pkl; default: threshold_read_depth_variability/Langenfeld_2025_100to1000readsperbp.pkl)
-rmse4, --rmse-cutoff-function4 FILE
RMSE threshold function model for >1000 reads/bp (recommend: using output from Step 1 in Regressions/threshold_read_depth_variability/func4.pkl; default: threshold_read_depth_variability/Langenfeld_2025_gte1000readsperbp.pkl)
-o, --output-dir DIR
Directory to project outputs (default: QuantMeta/)
-j, --cores N
Number of cores (default: 4)
-h, --help
Show this help message

Input Formatting

c, --config FILE

  • txt file with samples listed one per line

-s, --spike-info FILE

  • Tab-separated txt file with 5 columns (Sample, Library_Mass, DNA_Extract_Conc, Spike_Frac, ssDNA)
    • Sample: sample names
    • Library_Mass: mass of DNA used in library preparation (ng)
    • DNA_Extract_Conc: concentration of DNA in the extract used for library preparations (ng/µL)
    • Spike_Frac: fraction of total DNA from standards (example: 1% standard spike-in would be 0.01)
    • ssDNA: fraction of total DNA from ssDNA standards (if no ssDNA standards were added, input 0)

-mix, --dsDNA-std-file FILE

  • Tab-separated txt file with 4 columns (ID, Mass, Rel_Abund, length)
    • ID: individual spike-in standard names
    • Mass: mass of standard (ng)
    • Rel_Abund: fraction of standard in the mix
    • length: length of the standard (bp)

-ssmix, --ssDNA-std-file FILE

  • Tab-separated txt file with 4 columns (ID, Mass, Rel_Abund, length) -- same format as -mix/--dsDNA-std-file

Outputs

Optional Pre-Step: Generate Detection Threshold

Regressions/detection/detection_threshold_custom.json: Custom detection threshold generated for each sample. Regressions/detection/optimal_thresholds_plot.png: Visualization of the detection threshold generated for each sample (custom version of Figure S4 in Langenfeld et al. 2025).

Step 1: Generate Standard Curves and Read Mapping Error Assessment/Correction Regressions

Regressions/quantification/{sample_name}_quantmeta_results.txt: Table of individual standards "known_conc" (concentration spiked in copies/µL DNA extract) and "predicted_conc" (observed abundance in metagenome when taking into account library prep mass and DNA concentration in submitted sample). These are the individual points used in a linear regression to relate absolute to relative abundances in the log-scale standard curve.
Regressions/quantification/{sample_name}_rel_to_abs.pkl: Standard curve for quant-targets of the log-scale linear regression relating absolute to relative abundances of spike-in standards.
Regressions/quantification/{sample_name}_standards_rel_to_abs.png: Visualization of the standard curve.

Regressions/read_depth_variability/boxplot_regressions.png: Visualization of normalized RMSE for each of the average read depth per bp bins (0-10 reads/bp, 10-100 reads/bp, 100-1,000 reads/bp, >=1,000 reads/bp).
Regressions/read_depth_variability/qq_plots_regressions.png: Q-Q plots of the residuals for each of the average read depth per bp bins (custom version of Figure S5 in Langenfeld et al. 2025). It is important that the blue dots do not deviate significantly from the red line which would indicate a poor fit for the read depth variability regression in that read depth per bp bin.
Regressions/read_depth_variability/reg1.pkl: Read depth variability regression for the 0-10 reads/bp bin.
Regressions/read_depth_variability/reg2.pkl: Read depth variability regression for the 10-100 reads/bp bin.
Regressions/read_depth_variability/reg3.pkl: Read depth variability regression for the 100-1,000 reads/bp bin.
Regressions/read_depth_variability/reg4.pkl: Read depth variability regression for the >=1,000 reads/bp bin.

Regressions/threshold_read_depth_variability/RMSE_scatter.png: Visualization of the RMSE each standard at their average read depth divided into the average read depth per bp bins.
Regressions/threshold_read_depth_variability/RMSE_thresholds.png: Visualization of the RMSE thresholds for each average read depth per bp bins (custom version of Figure 3B in Langenfeld et al. 2025).
Regressions/threshold_read_depth_variability/func1.pkl: RMSE threshold for read depth variability for the 0-10 reads/bp bin.
Regressions/threshold_read_depth_variability/func2.pkl: RMSE threshold for read depth variability for the 10-100 reads/bp bin.
Regressions/threshold_read_depth_variability/func3.pkl: RMSE threshold for read depth variability for the 100-1,000 reads/bp bin.
Regressions/threshold_read_depth_variability/func4.pkl: RMSE threshold for read depth variability for the >=1,000 reads/bp bin.

Step 2: Quantify Targets Genes or Genomes

Mapping/{sample_name}/{target_name}_mapping_analysis.txt: Detection results for each target including the observed entropy (E_rel) and the minimum entropy required for detection (E_detect).
Results/{sample_name}/{target_name}_concentrations.tsv: Final concentrations of each quantifiable and detected target in the DNA extract (copies/µL DNA extract) with 95% confidence interval values.
Results/{sample_name}/Ind_Correction_Results/{target_name}_corrected_results.tsv: Read mapping error assessment with notation if quantification correction was run and its outcomes. Each individual target undergoing quantification correction will have its own file in this directory with information on where error correction was performed and how the read depth was altered.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quantmeta-2.0.2.tar.gz (115.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quantmeta-2.0.2-py3-none-any.whl (119.0 kB view details)

Uploaded Python 3

File details

Details for the file quantmeta-2.0.2.tar.gz.

File metadata

  • Download URL: quantmeta-2.0.2.tar.gz
  • Upload date:
  • Size: 115.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quantmeta-2.0.2.tar.gz
Algorithm Hash digest
SHA256 36ace042bea37be768555b1c63352e41d769947a6de111b9a69214dc6b88f2b7
MD5 55bf0a4d94bb9c7d97c74601fb6e0ff3
BLAKE2b-256 5a80f9c93f62b7beacfd43288f425d17e2bdb5e6c40168bcfd9a8aab3e1c95c7

See more details on using hashes here.

File details

Details for the file quantmeta-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: quantmeta-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 119.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for quantmeta-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b5cfaad837dad13584188b198820541c584903728da82bd404420e37e8897b9b
MD5 2c04451f12d448c6784e968b3b4257cb
BLAKE2b-256 483c52f039524e42a33e5a8b6de6b38bc0e7d314f731a79f37c46fa6d278314f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page