Phylogenetic inference with Stan

These details have not been verified by PyPI

Project links

Homepage

Project description

phylostan: phylogenetic inference using Stan

Introduction

phylostan is a tool written in python for inferring phylogenetic trees from nucleotide datasets. It generates a variety of phylogenetic models using the Stan language. Through the pystan library, phylostan has access to Stan's variational inference and sampling (NUTS and HMC) engines. The program has been described and its performance evaluated in an article. The data and scripts used to generate the results can be found here.

Features

Phylogenetic model components:

Nucleotide substitution models: JC69, HKY, GTR
Rate heterogeneity: discretized Weibull distribution and general discrete distribution
Tree without clock constraint with uniform prior on topology
Time tree:
- Homochronous sequences: same sampling date
- Heterochronous sequences: sequences sampled at different time points
Molecular clocks:
- Strict
- Autocorrelated
- Uncorrelated: log-normal hierarchical prior
Coalescent models:
- Constant population size
- Skyride
- Skygrid

Algorithms provided by Stan:

Variational inference:
- Mean-field distribution
- Full-rank distribution
No U-Turn Sampler (NUTS)
Hamiltonian Monte Carlo (HMC)

Prerequisites

Program/Library	Version	Description
python	Tested on python 3.6, 3.7, 3.9
pystan	>=2.19 <3	API for Stan
dendropy		Library for manipulating trees and alignments
numpy	>=1.7

You can install phylostan using pip

pip install phylostan

You can also run it locally

python -m phylostan.phylostan <COMMAND>

where <COMMAND> is either the build or run command.

Command-line usage

phylostan is decomposed into two sub-commands:

build: creates a Stan file: a text file containing the model.
run: runs a Stan file with the data.

These two steps are separated so the user can edit the Stan model. The main reason would be to modify the priors.

To get some help about the build or run commands:

phylostan build --help
phylostan run --help

Quickstart

We are going to use the fluA.fa alignment and fluA.tree tree files. This dataset contains 69 influenza A virus haemagglutinin nucleotide sequences isolated between 1981 and 1998.

First, a Stan script needs to be generated using the build command:

cd examples/fluA
phylostan build -s fluA-GTR-W4.stan  -m HKY -C 4 \
 --heterochronous --estimate_rate --clock strict --coalescent constant

This command is going to create a Stan file fluA-GTR-W4.stan with the following model:

Hasegawa, Kishino and Yano (HKY) nucleotide substitution model
Rate heterogeneity with 4 rate categories using the Weibull distribution
Assumes that sequences were sampled are different time points (heterochronous)
Constant effective population size
The substitution rate will be estimated

In the second step we compile and run the script with our data

phylostan run -s fluA-GTR-W4.stan  -m HKY -C 4 \
 --heterochronous --estimate_rate --clock strict --coalescent constant \
 -i fluA.fa -t fluA.tree -o fluA -q meanfield

The run command requires the data (tree and alignment) and an output parameter. It also needs the parameters that were provided to the build command. The output will consists of 4 files:

fluA: this file is the output file of Stan. It contains the samples drawn from the variational distribution (or MCMC samples).
fluA.diag: this file is also generated by Stan and it contains some information such as the ELBO at each iteration.
fluA.trees: this file is a nexus file containing trees. It can be opened with a program such as FigTree or summarized using treeannotator from BEAST or BEAST2.
fluA-GTR-W4.pkl: the Stan script is compiled into this binary file. This file can be reused automatically by phylostan unless it must be recompiled, then the option --compile can be used.

At the end of the run, phylostan will print on the screen the mean and 95% credibility interval of the parameters of interest:

Weibull (shape) mean: 0.488 95% CI: (0.383,0.616)
Strict clock (rate) mean: 0.00499 95% CI: (0.00432,0.00577)
Constant population size (theta) mean: 4.03 95% CI: (3.14,5.05)
HKY (kappa) mean: 5.58 95% CI: (4.37 7.039)
Root height mean: 18.96 95% CI: (18.36 19.74)

In this example we have used a mean-field distribution (-q meanfield) to approximate the posterior using variational inference. The Stan model is already compiled so we can run the NUTS algorithm without re-generating the script file, simply issue the command:

phylostan run -s fluA-GTR-W4.stan  -m HKY -C 4 \
 --heterochronous --estimate_rate --clock strict --coalescent constant \
 -i fluA.fa -t fluA.tree -o fluA -a nuts

The NUTS algorithm is much slower (and more accurate) than variational inference so it should be used on a small dataset.

Citing phylostan

Mathieu Fourment and Aaron E. Darling. Evaluating Probabilistic Programming and Fast Variational Bayesian Inference in Phylogenetics. 2019 PeerJ. doi: 10.7717/peerj.8272.

@article{fourment2019phylostan,
  title    = "Evaluating probabilistic programming and fast variational
              {B}ayesian inference in phylogenetics",
  author   = "Fourment, Mathieu and Darling, Aaron E",
  journal  = "PeerJ",
  volume   =  7,
  pages    = "e8272",
  month    =  dec,
  year     =  2019
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.5.post1

Dec 7, 2022

1.0.5

May 26, 2022

1.0.4

May 31, 2021

1.0.3

Oct 16, 2019

1.0.2

Jul 15, 2019

1.0.1

Jul 7, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phylostan-1.0.5.post1-py3-none-any.whl (37.4 kB view details)

Uploaded Dec 7, 2022 Python 3

File details

Details for the file phylostan-1.0.5.post1-py3-none-any.whl.

File metadata

Download URL: phylostan-1.0.5.post1-py3-none-any.whl
Upload date: Dec 7, 2022
Size: 37.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.9.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.10.1 urllib3/1.26.13 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.7

File hashes

Hashes for phylostan-1.0.5.post1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7b2282c6f34bc4093f0117646091e85024a60dc2559e97408f8b92f5a6d7ef51`
MD5	`9dda9c5d363d01459640b7f656c2a0e0`
BLAKE2b-256	`1cd23863912134e5f74cf0248c2e54f6cac2f116d254b100689c9c6877181926`

See more details on using hashes here.

phylostan 1.0.5.post1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

phylostan: phylogenetic inference using Stan

Introduction

Features

Prerequisites

You can install phylostan using pip

You can also run it locally

Command-line usage

Quickstart

Citing phylostan

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes