Skip to main content

Packaged Snakemake workflow for exploratory orthology-based molecular evolution analysis

Project description

BABAPPAsnake

babappasnake is a packaged Snakemake workflow for exploratory orthology-based molecular evolution analysis. It starts from a directory of proteomes plus a query protein, finds RBH-based ortholog candidates, selects a one-to-one orthogroup with mandatory outgroup retention, pauses at a resumable CDS checkpoint, and then runs BABAPPAlign, optional ClipKIT trimming, IQ-TREE 3, HyPhy, selective codeml branch-site tests, BH correction, and executive reporting.

Installation

For a local editable install from this repository:

pip install -e .

For a regular install after publishing:

pip install babappasnake

Confirm the CLI is available:

babappasnake --help

Direct CLI Usage

The direct run form is:

babappasnake \
  --input "/path/to/proteomes" \
  --query "/path/to/query.fasta" \
  --outgroup "Culex quinquefasciatus" \
  --clipkit yes \
  --iqtree protein

Alternative direct run form with trimming disabled and IQ-TREE on CDS:

babappasnake \
  --input "/path/to/proteomes" \
  --query "/path/to/query.fasta" \
  --outgroup "Culex quinquefasciatus" \
  --clipkit no \
  --iqtree cds

Required user-facing arguments:

  • --input: path to the proteome directory
  • --query: query protein FASTA
  • --outgroup: exact outgroup taxon name
  • --clipkit: yes or no
  • --iqtree: protein or cds

Common optional arguments:

  • --config /path/to/config.yaml: use a base YAML config and let the CLI override the main run settings
  • --threads 8: Snakemake core count
  • --output results_run1: results directory
  • --coverage-thresholds 50,60,70,80,90,95
  • --cds-input /path/to/cds.fasta: seed the checkpoint with an existing CDS file
  • --use-conda
  • --dry-run

Write the bundled default config template without running the workflow:

babappasnake --init-config babappasnake.config.yaml

Validate that the installed package can locate its bundled workflow:

babappasnake --validate-installation

Print the executive summary from a finished run:

babappasnake --summarize /path/to/results

Workflow Routing Logic

BABAPPAlign always produces both of these files after the CDS checkpoint:

  • results/alignment/babappalign/protein/aligned_proteins.fasta
  • results/alignment/babappalign/cds/aligned_cds.fasta

If --clipkit yes, the workflow runs ClipKIT in kpic-smartgap mode on the protein alignment and keeps:

  • results/alignment/clipkit/protein/trimmed_proteins.fasta
  • results/alignment/clipkit/cds/trimmed_cds.fasta
  • results/alignment/clipkit/cds/projected_trimmed_cds.fasta

trimmed_cds.fasta is the default biologically authoritative codon-safe CDS alignment. It is produced directly by ClipKIT in codon mode and is used downstream everywhere whenever --clipkit yes.

projected_trimmed_cds.fasta is preserved as a protein-guided QC/audit artifact so the retained protein mask can still be compared against the direct codon-mode CDS trim.

If --clipkit no, trimming is skipped and downstream steps use the untrimmed BABAPPAlign alignments.

IQ-TREE input selection is explicit:

  • --iqtree protein
    • with --clipkit yes: IQ-TREE uses trimmed_proteins.fasta
    • with --clipkit no: IQ-TREE uses aligned_proteins.fasta
  • --iqtree cds
    • with --clipkit yes: IQ-TREE uses codon-safe trimmed_cds.fasta
    • with --clipkit no: IQ-TREE uses aligned_cds.fasta

HyPhy and codeml always use the codon-compatible CDS alignment:

  • trimmed_cds.fasta when --clipkit yes
  • aligned_cds.fasta when --clipkit no

The tree used by HyPhy and codeml is whichever rooted IQ-TREE tree was produced from the user-selected --iqtree mode.

Checkpoint And Resume

The workflow intentionally pauses after orthogroup selection. It writes a checkpoint request directory containing the required member IDs and a README describing the expected CDS input.

When you reach the checkpoint, place the CDS FASTA at:

  • results/cds_input_checkpoint/request/user_supplied_cds.fasta

Then rerun the exact same babappasnake ... command. Snakemake resumes from the checkpoint.

If you already have the CDS FASTA before the run starts, pass it with --cds-input so the checkpoint is pre-seeded automatically.

Output Structure

results/
  rbh/
  threshold_comparison/
  selected_orthogroup/
  cds_input_checkpoint/
    request/
    validated/
  alignment/
    babappalign/
      protein/
      cds/
    clipkit/
      protein/
      cds/
  iqtree/
  hyphy/
    absrel/
    busted/
    meme/
  codeml/
  reports/
  logs/

Key outputs:

  • results/selected_orthogroup/selection_report.txt
  • results/alignment/alignment_validation.json
  • results/iqtree/rooted_labeled.treefile
  • results/hyphy/absrel/absrel_branch_summary.tsv
  • results/hyphy/busted/busted_summary.json
  • results/hyphy/meme/meme_sites.tsv
  • results/codeml/codeml_branchsite_summary.tsv
  • results/reports/executive_summary.txt

Configuration

The packaged default config lives at config/config.yaml in the source tree and is bundled into the installed package. The CLI writes a merged runtime copy under:

  • <results_dir>/.babappasnake/runtime_config.yaml

You can still run the workflow directly with Snakemake if needed:

python -m snakemake \
  --snakefile Snakefile \
  --configfile config/config.yaml \
  --cores 8 \
  --rerun-incomplete

Development Notes

Build a distributable package locally:

python -m build

Run the tests:

python -m pytest -q

The package build refreshes the bundled workflow assets automatically so the installed babappasnake command can launch Snakemake without needing the source tree.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babappasnake-0.1.0.tar.gz (72.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

babappasnake-0.1.0-py3-none-any.whl (56.1 kB view details)

Uploaded Python 3

File details

Details for the file babappasnake-0.1.0.tar.gz.

File metadata

  • Download URL: babappasnake-0.1.0.tar.gz
  • Upload date:
  • Size: 72.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for babappasnake-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6ed7462c1341baaa76921f99e407fb7841ee167e539b0fa7a46a12302c6f0729
MD5 bd2456b05d34fa7a73653f42e145cda3
BLAKE2b-256 07415aaebc0e9e77982ea0bc03fb37c0885ff87396bd6ff6aff9da29e12254d0

See more details on using hashes here.

File details

Details for the file babappasnake-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: babappasnake-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 56.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for babappasnake-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c367233da0cd70d7a0e9f88c22f3785071376800c02761d1560a8245e0697232
MD5 5cfebdf75cf006c8f57ebd1bc8c69530
BLAKE2b-256 feaef4464dc6dce2b90c99c210ba8f362bfa0037febfd2782696867eac2d90c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page