Skip to main content

SimPhyni: a tool for phylogenetic trait simulation and inference.

Project description

SimPhyNI

Overview

SimPhyNI (Simulation-based Phylogenetic iNteraction Inference) is a phylogenetically-aware framework for detecting evolutionary associations between binary traits (e.g., gene presence/absence, major/minor alleles, binary phenotypes) on microbial phylogenetic trees. This tool leverages phylogenetic information to correct for spurious associations caused by the relatedness of sister taxa.

This pipeline is designed to:

  • Infer evolutionary parameters for traits (gain/loss rates, time to emergence, ancestral states)
  • Estimate trait co-occurence null models through independent simulation of traits
  • Output statistical results for associations

Getting Started

Installation

First, ensure bioconda and conda-forge are channels are configured:

conda config --add channels conda-forge
conda config --add channels bioconda

Create a new environment:

conda create -n simphyni
conda activate simphyni

then install SimPhyNI from bioconda:

conda install simphyni

test installation:

simphyni version

Input Specifications

1. Phylogenetic Tree (.nwk)

  • Standard Newick format.
  • Must be rooted (both outgroup and midpoint are acceptable).
  • Tip labels must match the Sample column in your traits file.
  • Branch lengths are required for accurate rate estimation.

2. Traits File (.csv)

  • Rows: Genomes/Samples (matching tree tips).
  • Columns: Binary traits (0 = Absent, 1 = Present; non numerical values will be st to 1 and blank values will be set to 0).
  • Header: Required (Trait names).
  • Index: The first column must contain sample names.

Example traits.csv:

Sample,PhenotypeX,GeneA,GeneB
E_coli_1,1,0,1
E_coli_2,1,1,0
E_coli_3,0,0,1

Usage

Run mode (single-run)

simphyni run \
  --sample-name my_sample \
  --tree path/to/tree.nwk \
  --traits path/to/traits.csv \
  --run-traits 0,1,2 \
  --outdir my_analysis \
  --cores 4 \
  --temp_dir ./tmp \
  --min_prev 0.05 \
  --max_prev 0.95 \
  --plot
  • --run-traits specifies a comma-separated list of column indices (0-indexed) in the traits CSV for “trait against all” comparisons. Use 'ALL' (default) to include all traits.

Run mode (batch)

Create a samples.csv file:

Sample,Tree,Traits,run_traits,MinPrev,MaxPrev
run1,tree1.nwk,traits1.csv,All,0.05,0.95
run2,tree2.nwk,traits2.csv,"0,1,2",0.05,0.90
  • run_traits, MinPrev, and MaxPrev are optional columns that will use default values if not provided.

Then execute:

simphyni run --samples samples.csv --cores 16

Run with HPC

First, download example cluster scripts:

simphyni download-cluster-scripts

Edit cluster config file for your computing cluster then install the approprate snakemake executor from the avalible catalog: https://snakemake.github.io/snakemake-plugin-catalog/index.html (slurm shown below):

pip install snakemake-executor-plugin-slurm

run simphyni with the --profile flag:

simphyni run --samples samples.csv --profile cluster_profile

For all run options:

simphyni run --help

Example data

Download and run example inputs using:

simphyni download-examples
simphyni run --samples example_inputs/simphyni_sample_info.csv --cores 8 --plot

Outputs

Outputs for each sample are placed in structured folders in the working directory or specified output directory in subdirectories by sample name, including:

Main Result Files

simphyni_result.csv Contains the statistical results for all tested trait pairs.

Column Description
T1 / T2 Identifiers for the two traits being compared.
direction Direction of association: 1 = Positive, -1 = Negative.
effect size Variance adjusted magnitude of the association.
pval_naive Raw empirical P-value from the simulation.
pval_bh P-value corrected using the Benjamini-Hochberg FDR method (recommended for phenotype-genotype tests).
pval_by P-value corrected using the Benjamini-Yekutieli FDR method. (recommended for genotype-genotype tests)
pval_bonf P-value corrected using the strict Bonferroni method.
prevalence_T1 / _T2 Fraction of samples containing the trait (0.0 to 1.0).

Additional Outputs

  • simphyni_object.pkl: Optional file containing the completed analysis object, parsable with an active SimPhyNI environment. Controlled with the --save-object flag (not recommended for large analyses > 1,000,000 comparisons).
  • Plots: Heatmap summaries of tested associations (if --plot is enabled).

Directory Structure

SimPhyNI/
├── simphyni/               # Core package
│   ├── Simulation/         # Simulation scripts
│   ├── scripts/            # Snakemake scripts
│   ├── Snakefile.py/       # Workflow build file
│   ├── simphyni_cli.py/    # Command line entry points
│   └── envs/simphyni.yaml  # Conda environment (used in snakemake)
├── test/                   # Testing suite
├── conda-recipe/           # Build recipe 
├── cluster_scripts         # Cluster configs for SLURM
├── example_inputs          # Example inputs to run SimPhyNI
└── pyproject.toml

Contact

For questions, please open an issue or contact Ishaq Balogun at https://github.com/jpeyemi.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simphyni-1.0.2.tar.gz (37.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simphyni-1.0.2-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file simphyni-1.0.2.tar.gz.

File metadata

  • Download URL: simphyni-1.0.2.tar.gz
  • Upload date:
  • Size: 37.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for simphyni-1.0.2.tar.gz
Algorithm Hash digest
SHA256 46c2480c83444ee0fdd390b7ceaa0c1d5c1abf437fecc1ba99317506211a4650
MD5 2dfc483701096dc1cb3fdc65f75b387f
BLAKE2b-256 dbe4d8737a615a0e5de1fb7125eb9d4721d47a11b590c13a0bfc3dbdb8eb0691

See more details on using hashes here.

File details

Details for the file simphyni-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: simphyni-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.23

File hashes

Hashes for simphyni-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a6df711af1c150fb8fb977d9deff84dacdbd6d2b699583fe78938f90d8f9d5ae
MD5 bf6316cab71e46e4ff76da51cd174c9e
BLAKE2b-256 de1bd21c2889c850cd12375d16c4e3ce0eab7ffe4f9d4b170dde98c89e28f4f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page