Packaged Snakemake workflow for exploratory orthology-based molecular evolution analysis
Project description
BABAPPAsnake
babappasnake is a packaged Snakemake workflow for exploratory orthology-based molecular evolution analysis. It starts from a directory of proteomes plus a query protein, finds RBH-based ortholog candidates, selects a one-to-one orthogroup with mandatory outgroup retention, pauses at a resumable CDS checkpoint, and then runs BABAPPAlign, optional ClipKIT trimming, IQ-TREE 3, HyPhy, selective codeml branch-site tests, BH correction, and executive reporting.
Installation
For a local editable install from this repository:
pip install -e .
For a regular install after publishing:
pip install babappasnake
Confirm the CLI is available:
babappasnake --help
Direct CLI Usage
The direct run form is:
babappasnake \
--input "/path/to/proteomes" \
--query "/path/to/query.fasta" \
--outgroup "Culex quinquefasciatus" \
--clipkit yes \
--iqtree protein
Alternative direct run form with trimming disabled and IQ-TREE on CDS:
babappasnake \
--input "/path/to/proteomes" \
--query "/path/to/query.fasta" \
--outgroup "Culex quinquefasciatus" \
--clipkit no \
--iqtree cds
Required user-facing arguments:
--input: path to the proteome directory--query: query protein FASTA--outgroup: exact outgroup taxon name--clipkit:yesorno--iqtree:proteinorcds
Common optional arguments:
--config /path/to/config.yaml: use a base YAML config and let the CLI override the main run settings--threads 8: Snakemake core count--output results_run1: results directory--coverage-thresholds 50,60,70,80,90,95--cds-input /path/to/cds.fasta: seed the checkpoint with an existing CDS file--use-conda--dry-run
Write the bundled default config template without running the workflow:
babappasnake --init-config babappasnake.config.yaml
Validate that the installed package can locate its bundled workflow:
babappasnake --validate-installation
Print the executive summary from a finished run:
babappasnake --summarize /path/to/results
Workflow Routing Logic
BABAPPAlign always produces both of these files after the CDS checkpoint:
results/alignment/babappalign/protein/aligned_proteins.fastaresults/alignment/babappalign/cds/aligned_cds.fasta
If --clipkit yes, the workflow runs ClipKIT in kpic-smartgap mode on the protein alignment and keeps:
results/alignment/clipkit/protein/trimmed_proteins.fastaresults/alignment/clipkit/cds/trimmed_cds.fastaresults/alignment/clipkit/cds/projected_trimmed_cds.fasta
trimmed_cds.fasta is the default biologically authoritative codon-safe CDS alignment. It is produced directly by ClipKIT in codon mode and is used downstream everywhere whenever --clipkit yes.
projected_trimmed_cds.fasta is preserved as a protein-guided QC/audit artifact so the retained protein mask can still be compared against the direct codon-mode CDS trim.
If --clipkit no, trimming is skipped and downstream steps use the untrimmed BABAPPAlign alignments.
IQ-TREE input selection is explicit:
--iqtree protein- with
--clipkit yes: IQ-TREE usestrimmed_proteins.fasta - with
--clipkit no: IQ-TREE usesaligned_proteins.fasta
- with
--iqtree cds- with
--clipkit yes: IQ-TREE uses codon-safetrimmed_cds.fasta - with
--clipkit no: IQ-TREE usesaligned_cds.fasta
- with
HyPhy and codeml always use the codon-compatible CDS alignment:
trimmed_cds.fastawhen--clipkit yesaligned_cds.fastawhen--clipkit no
The tree used by HyPhy and codeml is whichever rooted IQ-TREE tree was produced from the user-selected --iqtree mode.
Checkpoint And Resume
The workflow intentionally pauses after orthogroup selection. It writes a checkpoint request directory containing the required member IDs and a README describing the expected CDS input.
When you reach the checkpoint, place the CDS FASTA at:
results/cds_input_checkpoint/request/user_supplied_cds.fasta
Then rerun the exact same babappasnake ... command. Snakemake resumes from the checkpoint.
If you already have the CDS FASTA before the run starts, pass it with --cds-input so the checkpoint is pre-seeded automatically.
Output Structure
results/
rbh/
threshold_comparison/
selected_orthogroup/
cds_input_checkpoint/
request/
validated/
alignment/
babappalign/
protein/
cds/
clipkit/
protein/
cds/
iqtree/
hyphy/
absrel/
busted/
meme/
codeml/
reports/
logs/
Key outputs:
results/selected_orthogroup/selection_report.txtresults/alignment/alignment_validation.jsonresults/iqtree/rooted_labeled.treefileresults/hyphy/absrel/absrel_branch_summary.tsvresults/hyphy/busted/busted_summary.jsonresults/hyphy/meme/meme_sites.tsvresults/codeml/codeml_branchsite_summary.tsvresults/reports/executive_summary.txt
Configuration
The packaged default config lives at config/config.yaml in the source tree and is bundled into the installed package. The CLI writes a merged runtime copy under:
<results_dir>/.babappasnake/runtime_config.yaml
You can still run the workflow directly with Snakemake if needed:
python -m snakemake \
--snakefile Snakefile \
--configfile config/config.yaml \
--cores 8 \
--rerun-incomplete
Development Notes
Build a distributable package locally:
python -m build
Run the tests:
python -m pytest -q
The package build refreshes the bundled workflow assets automatically so the installed babappasnake command can launch Snakemake without needing the source tree.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file babappasnake-0.1.0.tar.gz.
File metadata
- Download URL: babappasnake-0.1.0.tar.gz
- Upload date:
- Size: 72.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ed7462c1341baaa76921f99e407fb7841ee167e539b0fa7a46a12302c6f0729
|
|
| MD5 |
bd2456b05d34fa7a73653f42e145cda3
|
|
| BLAKE2b-256 |
07415aaebc0e9e77982ea0bc03fb37c0885ff87396bd6ff6aff9da29e12254d0
|
File details
Details for the file babappasnake-0.1.0-py3-none-any.whl.
File metadata
- Download URL: babappasnake-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c367233da0cd70d7a0e9f88c22f3785071376800c02761d1560a8245e0697232
|
|
| MD5 |
5cfebdf75cf006c8f57ebd1bc8c69530
|
|
| BLAKE2b-256 |
feaef4464dc6dce2b90c99c210ba8f362bfa0037febfd2782696867eac2d90c5
|