Skip to main content

Large timetrees

Project description

Chronumental

Chron​ologies from mon​umental phylogenetic trees


Chronumental is a tool for creating a "time-tree" (where distance on the tree represents time) from a phylogenetic divergence-tree (where distance on the tree reflects a number of genetic substitutions).

What sets Chronumental apart from most other tools is that it scales to extremely large trees, which can contain millions of nodes. Chronumental uses JAX to represent the task of computing a time tree in a differentiable graph for efficient calculation on a CPU or GPU.

Installation

Method 1: Using pipx (recommended for basic use - installs in its own isolated environment)

pip install --local pipx
pipx install  chronumental

Method 2: In your python environment

pip install chronumental

Usage

This demo uses trees and metadata collated by the UShER team.

wget https://hgwdev.gi.ucsc.edu/~angie/UShER_SARS-CoV-2/2021/10/06/public-2021-10-06.all.nwk.gz
wget https://hgwdev.gi.ucsc.edu/~angie/UShER_SARS-CoV-2/2021/10/06/public-2021-10-06.metadata.tsv.gz
chronumental --tree public-2021-10-06.all.nwk.gz --dates public-2021-10-06.metadata.tsv.gz --steps 100

Parameters

usage: chronumental [-h] --tree TREE --dates DATES [--dates_out DATES_OUT] [--tree_out TREE_OUT] [--always_use_final_params]
                   [--treat_mutation_units_as_normalised_to_genome_size TREAT_MUTATION_UNITS_AS_NORMALISED_TO_GENOME_SIZE]
                   [--clock CLOCK] [--variance_dates VARIANCE_DATES] [--variance_branch_length VARIANCE_BRANCH_LENGTH]
                   [--steps STEPS] [--lr LR] [--name_all_nodes]
                   [--expected_min_between_transmissions EXPECTED_MIN_BETWEEN_TRANSMISSIONS] [--only_use_full_dates] [--model MODEL]
                   [--output_unit {days,years}] [--variance_on_clock_rate] [--enforce_exact_clock] [--use_gpu] [--use_wandb]
                   [--wandb_project_name WANDB_PROJECT_NAME] [--clipped_adam]

Convert a distance tree into time tree with distances in days.

optional arguments:
  -h, --help            show this help message and exit
  --tree TREE           an input newick tree, potentially gzipped, with branch lengths reflecting genetic distance in integer number
                        of mutations
  --dates DATES         A metadata file with columns strain and date (in "2020-01-02" format, or less precisely, "2021-01", "2021")
  --dates_out DATES_OUT
                        Output for date tsv (otherwise will use default)
  --tree_out TREE_OUT   Output for tree (otherwise will use default)
  --always_use_final_params
                        Will force the model to always use the final parameters, rather than simply using those that gave the lowest
                        loss
  --treat_mutation_units_as_normalised_to_genome_size TREAT_MUTATION_UNITS_AS_NORMALISED_TO_GENOME_SIZE
                        If your branch sizes, and mutation rate, are normalised to per-site values, then enter the genome size here.
  --clock CLOCK         Molecular clock rate. This should be in units of something per year, where the "something" is the units on
                        the tree. If not given we will attempt to estimate this by RTT. This is only used as a starting point,
                        unless you supply --enforce_exact_clock.
  --variance_dates VARIANCE_DATES
                        Scale factor for date distribution. Essentially a measure of how uncertain we think the measured dates are.
  --variance_branch_length VARIANCE_BRANCH_LENGTH
                        Scale factor for branch length distribution. Essentially how close we want to match the expectation of the
                        Poisson.
  --steps STEPS         Number of steps to use for the SVI
  --lr LR               Adam learning rate
  --name_all_nodes      Should we name all nodes in the output tree?
  --expected_min_between_transmissions EXPECTED_MIN_BETWEEN_TRANSMISSIONS
                        For forming the prior, an expected minimum time between transmissions in days
  --only_use_full_dates
                        Only use full dates, given to the precision of a day
  --model MODEL         Model type to use
  --output_unit {days,years}
                        Unit for the output branch lengths on the time tree.
  --variance_on_clock_rate
                        Will cause the clock rate to be drawn from a random distribution with a learnt variance.
  --enforce_exact_clock
                        Will cause the clock rate to be exactly fixed at the value specified in clock, rather than learnt
  --use_gpu             Will attempt to use the GPU. You will need a version of CUDA installed to suit Numpyro.
  --use_wandb           This flag will trigger the use of Weights and Biases to log the fitting process. This must be installed with
                        'pip install wandb'
  --wandb_project_name WANDB_PROJECT_NAME
                        Wandb project name
  --clipped_adam        Will use the clipped version of Adam

Similar tools

TreeTime is a more advanced tool for inferring time trees. If you have a dataset of e.g. <10,000 rather than millions of nodes you are definitely best off trying it. The TreeTime readme also links to other similar tools.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chronumental-0.0.41.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

chronumental-0.0.41-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file chronumental-0.0.41.tar.gz.

File metadata

  • Download URL: chronumental-0.0.41.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for chronumental-0.0.41.tar.gz
Algorithm Hash digest
SHA256 1c5a0c2631628e0221e6bbe92a08f975c5fe0ea2cf507ec877550e8f23f8e8d3
MD5 530a5ae4463634401871d49e477a7b26
BLAKE2b-256 3125a457f90fa63e8bfe08ff52110046062b9fdb6ae83b07126754453b1020fe

See more details on using hashes here.

File details

Details for the file chronumental-0.0.41-py3-none-any.whl.

File metadata

  • Download URL: chronumental-0.0.41-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for chronumental-0.0.41-py3-none-any.whl
Algorithm Hash digest
SHA256 046cb82c8997227fab0ea8e61660f24e8fb8dc3bc5367a64e5271fb9c285b224
MD5 3dc1c8ec466a49219e2e72e9a747bc49
BLAKE2b-256 39cec72517e866f996674f156917cfe6dd4673d678cc4d876d82eed7129f92ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page