A Python package for working with Bactopia
Project description
bactopia-py
A Python package for working with Bactopia
Bactopia Subcommands
There are many subcommands available in Bactopia. Here is a brief description of each command:
Command | Description |
---|---|
bactopia-citations |
Print out tools and citations used throughout Bactopia |
bactopia-datasets |
Download optional datasets to supplement your analyses with Bactopia |
bactopia-download |
Builds Bactopia environments for use with Nextflow. |
bactopia-prepare |
Create a 'file of filenames' (FOFN) of samples to be processed by Bactopia |
bactopia-search |
Query against ENA and SRA for public accessions to process with Bactopia |
bactopia-summary |
Generate a summary table from the Bactopia results. |
Below is the --help
output for each subcommand.
bactopia-citations
Usage: bactopia-citations [OPTIONS]
Print out tools and citations used throughout Bactopia
╭─ Options ────────────────────────────────────────────────────────────────────────────╮
│ --version -V Show the version and exit. │
│ * --bactopia-path -b TEXT Directory where Bactopia repository is stored │
│ [required] │
│ --name -n TEXT Only print citation matching a given name │
│ --plain-text -p Disable rich formatting │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
bactopia-datasets
Usage: bactopia-datasets [OPTIONS] [UNKNOWN]...
Download optional datasets to supplement your analyses with Bactopia
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --bactopia-path TEXT Directory where Bactopia repository is stored [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Download Related Options ───────────────────────────────────────────────────────────╮
│ --datasets_cache TEXT Base directory to download datasets to (Defaults to env │
│ variable BACTOPIA_CACHEDIR, a subfolder called datasets │
│ will be created) │
│ [default: ${HOME}/.bactopia] │
│ --force Force overwrite of existing pre-built environments. │
│ --max_retry INTEGER Maximum times to attempt creating Conda environment. │
│ (Default: 3) │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --verbose Print debug related text. │
│ --silent Only critical errors will be printed. │
│ --version Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
bactopia-download
Usage: bactopia-download [OPTIONS] [UNKNOWN]...
Builds Bactopia environments for use with Nextflow.
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --bactopia-path TEXT Directory where Bactopia results are stored [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Build Related Options ──────────────────────────────────────────────────────────────╮
│ --envtype [conda|docker|singularity| The type of environment to │
│ all] build. │
│ [default: conda] │
│ --wf TEXT Build a environment for a │
│ the given workflow │
│ [default: bactopia] │
│ --condadir TEXT Directory to create Conda │
│ environments │
│ (NXF_CONDA_CACHEDIR env │
│ variable takes precedence) │
│ --use_conda Use Conda for building │
│ Conda environments instead │
│ of Mamba │
│ --singularity_cache TEXT Directory to download │
│ Singularity images │
│ (NXF_SINGULARITY_CACHEDIR │
│ env variable takes │
│ precedence) │
│ --singularity_pull_docker… Force conversion of Docker │
│ containers, instead │
│ downloading Singularity │
│ images directly │
│ --force_rebuild Force overwrite of │
│ existing pre-built │
│ environments. │
│ --max_retry INTEGER Maximum times to attempt │
│ creating Conda │
│ environment. (Default: 3) │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --verbose Print debug related text. │
│ --silent Only critical errors will be printed. │
│ --version Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ────────────────────────────────────────────────────────────────────────────╮
│ --build-all Builds all environments for Bactopia workflows │
│ --build-nfcore Builds all nf-core related environments │
╰──────────────────────────────────────────────────────────────────────────────────────╯
bactopia-prepare
Usage: bactopia-prepare [OPTIONS]
Create a 'file of filenames' (FOFN) of samples to be processed by Bactopia
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --path -p TEXT Directory where FASTQ files are stored [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Matching Options ───────────────────────────────────────────────────────────────────╮
│ --assembly-ext -a TEXT Extension of the FASTA assemblies [default: .fna.gz] │
│ --fastq-ext -f TEXT Extension of the FASTQs [default: .fastq.gz] │
│ --fastq-separator TEXT Split FASTQ name on the last occurrence of the │
│ separator │
│ [default: _] │
│ --pe1-pattern TEXT Designates difference first set of paired-end reads │
│ [default: [Aa]|[Rr]1|1] │
│ --pe2-pattern TEXT Designates difference second set of paired-end reads │
│ [default: [Bb]|[Rr]2|2] │
│ --merge Flag samples with multiple read sets to be merged by │
│ Bactopia │
│ --ont Single-end reads should be treated as Oxford Nanopore │
│ reads │
│ --recursive -r Directories will be traversed recursively │
│ --prefix TEXT Prefix to add to the path │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Sample Information Options ─────────────────────────────────────────────────────────╮
│ --metadata TEXT Metadata per sample with genome size and species │
│ information │
│ --genome-size -gsize INTEGER Genome size to use for all samples │
│ --species -s TEXT Species to use for all samples (If available, can be │
│ used to determine genome size) │
│ --taxid TEXT Use the genome size of the Taxon ID for all samples │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --examples Print example usage │
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version -V Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
bactopia-search
Usage: bactopia-search [OPTIONS]
Query against ENA and SRA for public accessions to process with Bactopia
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --query -q TEXT Taxon ID or Study, BioSample, or Run accession (can also be │
│ comma separated or a file of accessions) │
│ [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Query Options ──────────────────────────────────────────────────────────────────────╮
│ --exact-taxon Exclude Taxon ID descendants │
│ --limit -l INTEGER Maximum number of results (per query) to return │
│ [default: 1000000] │
│ --accession-limit -al INTEGER Maximum number of accessions to query at once │
│ [default: 5000] │
│ --biosample-subset INTEGER If a BioSample has multiple Experiments, maximum │
│ number to randomly select (0 = disabled) │
│ [default: 0] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Filtering Options ──────────────────────────────────────────────────────────────────╮
│ --min-base-count -mbc INTEGER Filters samples based on minimum base pair count │
│ (0 = disabled) │
│ [default: 0] │
│ --min-read-length -mrl INTEGER Filters samples based on minimum mean read length │
│ (0 = disabled) │
│ [default: 0] │
│ --min-coverage -mc INTEGER Filter samples based on minimum coverage (requires │
│ --genome_size, 0 = disabled) │
│ [default: 0] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --genome-size -gsize INTEGER Genome size to be used for all samples, and for │
│ calculating min coverage │
│ [default: 0] │
│ --outdir -o TEXT Directory to write output [default: ./] │
│ --prefix -p TEXT Prefix to use for output file names │
│ [default: bactopia] │
│ --force Overwrite existing reports │
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version -V Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
bactopia-summary
Usage: bactopia-summary [OPTIONS]
Generate a summary table from the Bactopia results.
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --bactopia-path -b TEXT Directory where Bactopia results are stored [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Gold Cutoffs ───────────────────────────────────────────────────────────────────────╮
│ --gold-coverage -gcov INTEGER Minimum amount of coverage required for Gold │
│ status │
│ [default: 100] │
│ --gold-quality -gqual INTEGER Minimum per-read mean quality score required │
│ for Gold status │
│ [default: 30] │
│ --gold-read-length -glen INTEGER Minimum mean read length required for Gold │
│ status │
│ [default: 95] │
│ --gold-contigs -gcontigs INTEGER Maximum contig count required for Gold │
│ status │
│ [default: 100] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Silver Cutoffs ─────────────────────────────────────────────────────────────────────╮
│ --silver-coverage -scov INTEGER Minimum amount of coverage required for │
│ Silver status │
│ [default: 50] │
│ --silver-quality -squal INTEGER Minimum per-read mean quality score │
│ required for Silver status │
│ [default: 20] │
│ --silver-read-length -slen INTEGER Minimum mean read length required for │
│ Silver status │
│ [default: 75] │
│ --silver-contigs -scontigs INTEGER Maximum contig count required for Silver │
│ status │
│ [default: 200] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Fail Cutoffs ───────────────────────────────────────────────────────────────────────╮
│ --min-coverage -mincov INTEGER Minimum amount of coverage required to pass │
│ [default: 20] │
│ --min-quality -minqual INTEGER Minimum per-read mean quality score │
│ required to pass │
│ [default: 12] │
│ --min-read-length -minlen INTEGER Minimum mean read length required to pass │
│ [default: 49] │
│ --max-contigs INTEGER Maximum contig count required to pass │
│ [default: 500] │
│ --min-assembled-size INTEGER Minimum assembled genome size │
│ --max-assembled-size INTEGER Maximum assembled genome size │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --outdir -o PATH Directory to write output [default: ./] │
│ --prefix -p TEXT Prefix to use for output files [default: bactopia] │
│ --force Overwrite existing reports │
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version -V Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
All The Bacteria (ATB)
The AllTheBacteria is a collection of nearly 2,000,000 bacterial genomes. Using available FASTQ files from the European Nucleotide Archive (ENA) and Sequence Read Archive (SRA), the genomes were assembled using [Shovill] and made publicly available from the Iqbal Lab.
To make it easy to utilize Bactopia Tools with
assemblies from AllTheBacteria, bactopia-atb-formatter
was created. This tool will create a
directory structure that resembles output from an actual Bactopia run.
bactopia-atb-formatter
Usage: bactopia-atb-formatter [OPTIONS]
Restructure All-the-Bacteria assemblies to allow usage with Bactopia Tools
╭─ Required Options ───────────────────────────────────────────────────────────────────╮
│ * --path -p TEXT Directory where FASTQ files are stored [required] │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Bactopia Directory Structure Options ───────────────────────────────────────────────╮
│ --bactopia-dir -b TEXT The path you would like to place bactopia │
│ structure │
│ [default: bactopia] │
│ --publish-mode -m [symlink|copy] Designates plascement of assemblies will be │
│ handled │
│ [default: symlink] │
│ --recursive -r Traverse recursively through provided path │
╰──────────────────────────────────────────────────────────────────────────────────────╯
╭─ Additional Options ─────────────────────────────────────────────────────────────────╮
│ --verbose Increase the verbosity of output │
│ --silent Only critical errors will be printed │
│ --version -V Show the version and exit. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
Example Usage for Legionella pneumophila
To demonstrate the usage of bactopia-atb-formatter
, we will use assemblies for
Legionella pneumophila. The following steps will download the assemblies, build the
Bactopia directory structure, and then run legsta
via the Bactopia Tool.
Download the Assemblies
First will download the Legionella pneumophila assemblies from AllTheBacteria. After downloading
we will extract them into a folder called legionella-assemblies
. Within this folder, there will be
subdirectories for each tarball that was downloaded.
mkdir atb-legionella
cd atb-legionella
# Download the assemblies
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__01.asm.tar.xz
wget https://ftp.ebi.ac.uk/pub/databases/AllTheBacteria/Releases/0.1/assembly/legionella_pneumophila__02.asm.tar.xz
# Extract the assemblies
mkdir legionella-assemblies
tar -C legionella-assemblies -xJf legionella_pneumophila__01.asm.tar.xz
tar -C legionella-assemblies -xJf legionella_pneumophila__02.asm.tar.xz
Create the Bactopia Directory Structure
With the assemblies extracted, we can now create the Bactopia directory structure using
bactopia-atb-formatter
. Once complete, each assembly will have its own folder created
which matches the BioSample accession of the assembly.
# Create the Bactopia directory structure
bactopia atb-formatter --path legionella-assemblies --recursive
2024-03-22 14:30:07 INFO 2024-03-22 14:30:07:root:INFO - Setting up Bactopia directory structure (use --verbose to see more details) atb_formatter.py:129
2024-03-22 14:30:08 INFO 2024-03-22 14:30:08:root:INFO - Bactopia directory structure created at bactopia atb_formatter.py:134
INFO 2024-03-22 14:30:08:root:INFO - Total assemblies processed: 5393
Please note the usage of --recursive
which will traverse the legionella-assemblies
directory
to find all assemblies contained. At this point, the bactopia
directory structure has been
created for 5,393 assemblies and is ready for use with Bactopia Tools.
Use Bactopia to run Legsta
As mentioned above, we will use legsta to analyze each of the Legionella pneumophila assemblies. To do this, we will use the legsta Bactopia Tool.
# Run legsta (please utilize Docker or Singularity only for reproducibility)
bactopia --wf legsta -profile singularity
Please note, for reproducibility, it is recommended to use Docker or Singularity with Bactopia Tools.
Upon completion, you should be met with something like the following:
[5d/d04297] process > BACTOPIATOOLS:LEGSTA:LEGSTA_MODULE (SAMN29911258) [100%] 5393 of 5393 ✔
[71/c63bf7] process > BACTOPIATOOLS:LEGSTA:CSVTK_CONCAT (legsta) [100%] 1 of 1 ✔
[16/833262] process > BACTOPIATOOLS:CUSTOM_DUMPSOFTWAREVERSIONS (1) [100%] 1 of 1 ✔
Bactopia Tools: `legsta Execution Summary
---------------------------
Bactopia Version : 3.0.1
Nextflow Version : 23.10.1
Command Line : nextflow run /home/rpetit3/bactopia/main.nf --wf legsta --bactopia bactopia/ -profile singularity
Resumed : false
Completed At : 2024-03-22T15:09:54.959834620-06:00
Duration : 32m 51s
Success : true
Exit Code : 0
Error Report : -
Launch Dir : /home/rpetit3/test-legsta
WARN: Graphviz is required to render the execution DAG in the given format -- See http://www.graphviz.org for more info.
Completed at: 22-Mar-2024 15:09:55
Duration : 32m 52s
CPU hours : 5.2
Succeeded : 5'395
That's it! Now you can take advantage of any of the Bactopia Tools that utilize assemblies as inputs.
Feedback
Your feedback is very valuable! If you run into any issues using Bactopia, have questions, or have some ideas to improve Bactopia, I highly encourage you to submit it to the Issue Tracker.
License
Citation
Petit III RA, Read TD, Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5 (2020), https://doi.org/10.1128/mSystems.00190-20.
Author
- Robert A. Petit III
- Twitter: @rpetit3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.