GC-aware species abundance estimation from metagenomic data.

These details have not been verified by PyPI

Project links

Project description

GuaCAMOLE

GC-aware species abundance estimation from metagenomic data

Overview

GuaCAMOLE estimates and corrects for the GC bias inherent in most metagenomic sequencing libraries. GuaCAMOLE is based on Bracken and relies on Kraken2 for read classification; see our publication for an in-depth description and evaluation of the algorithm. For instructions how to run GuaCAMOLE on your own dataset see below.

Features

Create GC reference distributions from Kraken2 databases.
Estimate species abundances using GC-aware correction.
Generate detailed plots for data visualization.

Installation

You can install the package directly from a folder:

pip install .

Or from GitHub:

pip install git+https://github.com/Cibiv/GuaCAMOLE.git

To install the quadratic programming solver:

pip install qpsolvers['cvxopt']

Testing

The demo_data folder contains a Kraken2 database containing the 19 bacterial species found in the mock community of Tourlousse et al. [1], already prepared to be used by Bracken and GuaCAMOLE. The folder also contains a 1% subsample of metagenomic sequencing library SRR12996245 representing that mock community, and the Kraken2 output for that subsample. To run GuaCAMOLE on this data, run

./SRR12996245.1pct.sh

in that folder. The SRR12996245.1pct.sh unzips the FASTQ files and Kraken2 results, changes into to the subdirectory out/, and runs GuaCAMOLE with

guacamole \
	--output SRR12996245.1pct.guaca \
	--kraken_report ../SRR12996245.1pct_report.txt \
	--kraken_file ../SRR12996245.1pct.kraken \
	--read_files ../SRR12996245.1pct_1.fastq ../SRR12996245.1pct_2.fastq \
	--kraken_db ../demo_db \
	--read_len 150 \
	--fragment_len 400 \
	--length_correction True \
	--threshold 5 \
	--plot True

Docker

GuaCAMOLE is also available as docker image under laurenz0908/guacamole:latest. To use the docker image for testing GuaCAMOLE do:

mkdir guacamole_out

for creating the output folder. Then run the docker image interactively by running

docker run -it \
  -v "$(pwd)/guacamole_output:/app/demo_data/out" \
  -w "/app/demo_data/out" \
  laurenz0908/guacamole:latest

Then from the docker image shell, run

guacamole \
        --output SRR12996245.1pct.guaca \
        --kraken_report ../SRR12996245.1pct_report.txt \
        --kraken_file ../SRR12996245.1pct.kraken \
        --read_files ../SRR12996245.1pct_1.fastq ../SRR12996245.1pct_2.fastq \
        --kraken_db ../demo_db \
        --read_len 150 \
        --fragment_len 400 \
        --length_correction True \
        --threshold 5 \
        --plot True

You can exit the image via exit. The output files of GuaCAMOLE should now be in the guacamole_output directory.

Usage

1. Building a Kraken2 database

To build a standard Kraken2 database compatible with GuaCAMOLE, run

kraken2-build --fast-build --standard --db path/to/kraken_db \
              --threads number_of_cpus_to_use --no-masking

Existing Kraken2 databases can be used, provided that they have been built with the --no-masking option. Masked databases cannot be used with GuaCAMOLE.

2. Create Reference Distribution

GuaCAMOLE requires a Bracken database and a reference distribution, both of which must be created for a read length matching that of the data. By also specifying the --fragment_len parameter when building the reference distribution, the GC content is computed on a per-fragment instead of a per-read level. This can help with the accuracy of GuaCAMOLE, especially if the fragments are a lot longer than the reads. These databases must be built once for every read length (and fragment length if specified).

bracken-build -d path/to/kraken_db -t number_of_cpus_to_use -l read_length_of_data 
create-reference-dist --lib_path path/to/kraken_db --ncores number_of_cpus_to_use \
                      --read_len read_length_of_data --fragment_len fragment_length_of_data

3. Run GuaCAMOLE for Species Abundance Estimation

To estimate species abundances from your data, the reads are first be assigned to taxa with Kraken2, and the Kraken2 output is then processed with GuaCAMOLE. GuaCAMOLE includes Bracken, so no separated invocation of Bracken is necessary.

kraken2 --db path/to/kraken_db --threads number_of_cpus_to_use --report path/to/kraken_report \
        --paired path/to/reads_1.fastq path/to/reads_2.fastq \
        > path/to/kraken_file
guacamole --kraken_report path/to/kraken_report --kraken_file path/to/kraken_file --kraken_db path/to/kraken_db \
          --read_files path/to/reads_1.fastq path/to/reads_2.fastq
          --read_len read_length_of_data --fragment_len fragment_length_of_data \
          --output result.txt

Command-line Options

--kraken_report: Kraken2 report file (required)
--kraken_file: Kraken2 file with classifications (required)
--kraken_db: Path to the Kraken2 database (required)
--read_len: Read length (required)
--output: Output file name (required)
--read_files: Path to input read files (required)
--threshold: Minimum number of reads found for a species to estimate its abundance, default=500
--length_correction, genome size correction (taxonomic vs. read abundance), default=False
--plot, True if detailed plots should be generated
--fp_cycles, Number of iterations for false positive removal, default=4
--reg_weight, Determines how strong the regularization should be [between 0 and 1]', default=0.01
--fragment_len, length of the fragment if paired end and known', default=None
--fasta, True if reads are in fasta format, false if fastq', default=False

Output

The Output is the same as the tab-delimited Bracken output file. Three additional columns are added:

Bracken_estimate, the abundance estimate from Bracken (if length_correction=True they are genome length corrected the same as the GuaCAMOLE estimates)
GuaCAMOLE_estimate, the abundances estimated using the abundance parameter from the GuaCAMOLE algorithm
GuaCMAOLE_est_eff, the abundances computed using the estimated efficiencies by GuaCAMOLE (this does also include esimates for the taxa that were labelled as false positives by GuaCAMOLE)
GC_content, the GC content of the taxon's genome

References

[1] Tourlousse, D.M., Narita, K., Miura, T. et al, 2021. Validation and standardization of DNA extraction and library construction methods for metagenomics-based human fecal microbiome measurements. Microbiome 9, 95. https://doi.org/10.1186/s40168-021-01048-3

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.1

Nov 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guacamole_bio-1.0.1.tar.gz (32.3 kB view details)

Uploaded Nov 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

guacamole_bio-1.0.1-py3-none-any.whl (31.3 kB view details)

Uploaded Nov 18, 2025 Python 3

File details

Details for the file guacamole_bio-1.0.1.tar.gz.

File metadata

Download URL: guacamole_bio-1.0.1.tar.gz
Upload date: Nov 18, 2025
Size: 32.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for guacamole_bio-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`d920c9b78e5440b1bc5126da61394b5d99ae65ca48a702e939e14bb284bf84aa`
MD5	`0f0dbe5b83cd6a054c664b857da58c22`
BLAKE2b-256	`757b93307c840feb62fd95dcc423c78bc5ab9464bc8bf047c4bbb178bdc74d6a`

See more details on using hashes here.

File details

Details for the file guacamole_bio-1.0.1-py3-none-any.whl.

File metadata

Download URL: guacamole_bio-1.0.1-py3-none-any.whl
Upload date: Nov 18, 2025
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for guacamole_bio-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8338267c50ac2a91ebfd36e532091b8185cce697a53ca2aabd03b9de3612c336`
MD5	`84937b806c9f721b5aadd7414d72e9db`
BLAKE2b-256	`2339a4ab554ef3a3929e9e394f5b8540331841ee28f3efcccf074a59275c294e`

See more details on using hashes here.

guacamole-bio 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GuaCAMOLE

Overview

Features

Installation

Testing

Docker

Usage

1. Building a Kraken2 database

2. Create Reference Distribution

3. Run GuaCAMOLE for Species Abundance Estimation

Command-line Options

Output

References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes