SimPhyni: a tool for phylogenetic trait simulation and inference.
Project description
SimPhyNI
Overview
SimPhyNI (Simulation-based Phylogenetic iNteraction Inference) is a phylogenetically-aware framework for detecting evolutionary associations between binary traits (e.g., gene presence/absence, major/minor alleles, binary phenotypes) on microbial phylogenetic trees. This tool leverages phylogenetic information to correct for spurious associations caused by the relatedness of sister taxa.
This pipeline is designed to:
- Infer evolutionary parameters for traits (gain/loss rates, time to emergence, ancestral states)
- Estimate trait co-occurence null models through independent simulation of traits
- Output statistical results for associations
Getting Started
Installation
First, ensure bioconda and conda-forge are channels are configured:
conda config --add channels conda-forge
conda config --add channels bioconda
Create a new environment:
conda create -n simphyni
conda activate simphyni
then install SimPhyNI from bioconda:
conda install simphyni
test installation:
simphyni version
Input Specifications
1. Phylogenetic Tree (.nwk)
- Standard Newick format.
- Must be rooted (both outgroup and midpoint are acceptable).
- Tip labels must match the
Samplecolumn in your traits file. - Branch lengths are required for accurate rate estimation.
2. Traits File (.csv)
- Rows: Genomes/Samples (matching tree tips).
- Columns: Binary traits (0 = Absent, 1 = Present; non numerical values will be st to 1 and blank values will be set to 0).
- Header: Required (Trait names).
- Index: The first column must contain sample names.
Example traits.csv:
Sample,PhenotypeX,GeneA,GeneB
E_coli_1,1,0,1
E_coli_2,1,1,0
E_coli_3,0,0,1
Usage
Run mode (single-run)
simphyni run \
--sample-name my_sample \
--tree path/to/tree.nwk \
--traits path/to/traits.csv \
--run-traits 0,1,2 \
--outdir my_analysis \
--cores 4 \
--temp_dir ./tmp \
--min_prev 0.05 \
--max_prev 0.95 \
--plot
--run-traitsspecifies a comma-separated list of column indices (0-indexed) in the traits CSV for “trait against all” comparisons. Use 'ALL' (default) to include all traits.
Run mode (batch)
Create a samples.csv file:
Sample,Tree,Traits,run_traits,MinPrev,MaxPrev
run1,tree1.nwk,traits1.csv,All,0.05,0.95
run2,tree2.nwk,traits2.csv,"0,1,2",0.05,0.90
run_traits,MinPrev, andMaxPrevare optional columns that will use default values if not provided.
Then execute:
simphyni run --samples samples.csv --cores 16
Run with HPC
First, download example cluster scripts:
simphyni download-cluster-scripts
Edit cluster config file for your computing cluster then install the approprate snakemake executor from the avalible catalog: https://snakemake.github.io/snakemake-plugin-catalog/index.html (slurm shown below):
pip install snakemake-executor-plugin-slurm
run simphyni with the --profile flag:
simphyni run --samples samples.csv --profile cluster_profile
For all run options:
simphyni run --help
Example data
Download and run example inputs using:
simphyni download-examples
simphyni run --samples example_inputs/simphyni_sample_info.csv --cores 8 --plot
Outputs
Outputs for each sample are placed in structured folders in the working directory or specified output directory in subdirectories by sample name, including:
Main Result Files
simphyni_result.csv
Contains the statistical results for all tested trait pairs.
| Column | Description |
|---|---|
T1 / T2 |
Identifiers for the two traits being compared. |
direction |
Direction of association: 1 = Positive, -1 = Negative. |
effect size |
Variance adjusted magnitude of the association. |
pval_naive |
Raw empirical P-value from the simulation. |
pval_bh |
P-value corrected using the Benjamini-Hochberg FDR method (recommended for phenotype-genotype tests). |
pval_by |
P-value corrected using the Benjamini-Yekutieli FDR method. (recommended for genotype-genotype tests) |
pval_bonf |
P-value corrected using the strict Bonferroni method. |
prevalence_T1 / _T2 |
Fraction of samples containing the trait (0.0 to 1.0). |
Additional Outputs
simphyni_object.pkl: Optional file containing the completed analysis object, parsable with an active SimPhyNI environment. Controlled with the--save-objectflag (not recommended for large analyses > 1,000,000 comparisons).- Plots: Heatmap summaries of tested associations (if
--plotis enabled).
Directory Structure
SimPhyNI/
├── simphyni/ # Core package
│ ├── Simulation/ # Simulation scripts
│ ├── scripts/ # Snakemake scripts
│ ├── Snakefile.py/ # Workflow build file
│ ├── simphyni_cli.py/ # Command line entry points
│ └── envs/simphyni.yaml # Conda environment (used in snakemake)
├── test/ # Testing suite
├── conda-recipe/ # Build recipe
├── cluster_scripts # Cluster configs for SLURM
├── example_inputs # Example inputs to run SimPhyNI
└── pyproject.toml
Contact
For questions, please open an issue or contact Ishaq Balogun at https://github.com/jpeyemi.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simphyni-1.0.2.tar.gz.
File metadata
- Download URL: simphyni-1.0.2.tar.gz
- Upload date:
- Size: 37.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46c2480c83444ee0fdd390b7ceaa0c1d5c1abf437fecc1ba99317506211a4650
|
|
| MD5 |
2dfc483701096dc1cb3fdc65f75b387f
|
|
| BLAKE2b-256 |
dbe4d8737a615a0e5de1fb7125eb9d4721d47a11b590c13a0bfc3dbdb8eb0691
|
File details
Details for the file simphyni-1.0.2-py3-none-any.whl.
File metadata
- Download URL: simphyni-1.0.2-py3-none-any.whl
- Upload date:
- Size: 29.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6df711af1c150fb8fb977d9deff84dacdbd6d2b699583fe78938f90d8f9d5ae
|
|
| MD5 |
bf6316cab71e46e4ff76da51cd174c9e
|
|
| BLAKE2b-256 |
de1bd21c2889c850cd12375d16c4e3ce0eab7ffe4f9d4b170dde98c89e28f4f5
|