Skip to main content

Workflow for whole genome sequencing based phylogeny of Illumina and ONT data.

Project description

PACU

PACU is a workflow for whole genome sequencing based phylogeny of Illumina and ONT R9/R10 data.

PACU stands for the Prokaryotic Awesome variant Calling Utility and is named after an omnivorous fish (that eats both Illumina and ONT reads).

PACU is also available on our public Galaxy instance (registration required).


INSTALLATION

CONDA installation

conda install -c bioconda -c conda-forge pacu_snp

If the above command fails, PACU can be installed in a new environment using the following commands:

conda create -n pacu_snp python=3.10
conda activate pacu_snp
conda install bioconda::pacu_snp -c bioconda -c conda-forge

Note: MEGA is currently not available through Conda, it can be installed manually from the link below, or IQ-Tree can be used instead.

Manual installation

The PACU workflow has the following dependencies:

The mapping script has the following additional dependencies:

The corresponding binaries should be in your PATH to run the workflow. Other versions of these tools may work, but have not been tested.

The required Python packages are listed in the requirements.txt file. Python 3.9 or 3.10 is recommended for a manual installation.

virtualenv pacu_env --python=python3.10;
. pacu_env/bin/activate;
pip install pacu_snp;

USAGE

usage: PACU [-h] [--ilmn-in ILMN_IN] [--ont-in ONT_IN] --ref-fasta REF_FASTA [--ref-bed REF_BED] [--dir-working DIR_WORKING] --output OUTPUT
            [--output-html OUTPUT_HTML] [--use-mega] [--include-ref] [--min-snp-af MIN_SNP_AF] [--min-snp-qual MIN_SNP_QUAL]
            [--min-snp-depth MIN_SNP_DEPTH] [--min-snp-dist MIN_SNP_DIST] [--min-global-depth MIN_GLOBAL_DEPTH] [--min-mq-depth MIN_MQ_DEPTH]
            [--bcftools-filt-af1] [--image-width IMAGE_WIDTH] [--image-height IMAGE_HEIGHT] [--threads THREADS] [--version]

options:
  -h, --help            show this help message and exit
  --ilmn-in ILMN_IN     Directory with Illumina input BAM files
  --ont-in ONT_IN       Directory with ONT input BAM files
  --ref-fasta REF_FASTA
                        Reference FASTA file
  --ref-bed REF_BED     BED file with phage regions
  --dir-working DIR_WORKING
                        Working directory
  --output OUTPUT       Output directory
  --output-html OUTPUT_HTML
                        Output report name
  --use-mega            If set, MEGA is used for the construction of the phylogeny (instead of IQ-TREE)
  --include-ref         If set, the reference genome is included in the phylogeny
  --min-snp-af MIN_SNP_AF
                        Minimum allele frequency for variants
  --min-snp-qual MIN_SNP_QUAL
                        Minimum SNP quality
  --min-snp-depth MIN_SNP_DEPTH
                        Minimum SNP depth
  --min-snp-dist MIN_SNP_DIST
                        Minimum distance between SNPs
  --min-global-depth MIN_GLOBAL_DEPTH
                        Minimum depth for all samples to include positions in SNP analysis
  --min-mq-depth MIN_MQ_DEPTH
                        MQ cutoff for samtools depth
  --skip-gubbins        If set, gubbins is skipped
  --bcftools-filt-af1   If enabled, allele frequency filtering also considers the VAF value
  --image-width IMAGE_WIDTH
                        Image width
  --image-height IMAGE_HEIGHT
                        Image height
  --threads THREADS
  --version             Print version and exit

Note: The location of the temporary directory can be changed by setting the TMPDIR environment variable.

Basic usage example

The PACU workflow requires BAM files as input with reads mapped to a reference genome. Illumina data can be provided using the --ilmn-in option, ONT data can be provided using the --ont-in option.

PACU \
    --ilmn-in in/ilmn/ \
    --ont-in in/ont/ \
    --ref-fasta ref.fasta \
    --output output/ \
    --dir-working work/ \
    --threads 8

Read mapping

A script is included to map reads to a reference genome in FASTA format for both ONT and Illumina data. The resulting BAM files can be used as input for the SNP workflow. The --trim option can be used to perform read trimming before mapping.

Illumina data

PACU_map \
    --ref-fasta genome.fasta \
    --read-type illumina \
    --fastq-illumina reads_1.fastq.gz reads_2.fastq.gz \
    --output mapped.bam \
    --threads 4

ONT data

PACU_map \
    --ref-fasta genome.fasta \
    --read-type ont \
    --fastq-ont reads_ont.fastq.gz \
    --output mapped.bam \
    --threads 4

TESTING

A test dataset is available under resources/testdata/bam, these files contain Escherichia coli reads mapped to a small part of the E. coli NC_002695.2 genome. This is a not a real dataset, and should only be used for testing.

The complete workflow can be tested using the following command:

pytest --log-cli-level=DEBUG pacu/tests/test_workflow.py

CONTACT

Create an issue to report bugs, propose new functions or ask for help.

CITATION

If you use this tool, please consider citing our publication.


Copyright - 2024 Bert Bogaerts bert.bogaerts@sciensano.be

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pacu_snp-1.0.0.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pacu_snp-1.0.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file pacu_snp-1.0.0.tar.gz.

File metadata

  • Download URL: pacu_snp-1.0.0.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pacu_snp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c18b9935bc855e1b3ef8c31975ed8d32254153a3e94f925da69eb8fd61b739db
MD5 013f16d6db670cafc1478f4e709420cb
BLAKE2b-256 20f9a643b45c1353bd33c5523dd033fa735593c88b474bc8ba90c0a78e63082f

See more details on using hashes here.

File details

Details for the file pacu_snp-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pacu_snp-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for pacu_snp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5af1b478385e774fcd833353f3c6a05e580308407f37c13cf9e2041344086ec
MD5 3e721e83db723d943d9ffb61aceb7237
BLAKE2b-256 3748e8311f483bb0cae672e52ba9c547602caa70f6739e6ea20c56196f553931

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page