Skip to main content

Command-line tool for simulating predictive datasets from MrBayes' output.

Project description

Build-Status Coverage-Status PyPI-Status License DOI-URI

predsim is a simple command-line tool for simulating predictive datasets from MrBayes’ output files. Datasets can be simulated under the GTR+G+I substitution model or any nested variant available in MrBayes (JC69, HKY85 etc.). The code is contained within a single module that can be imported using Python’s import mechanism. The tool uses Seq-Gen for simulating the DNA-sequences and builds on the third-party library DendroPy.

The code has been tested with Python 3.3 and 3.6.

Source repository: https://github.com/jmenglund/predsim


Prerequisites

  • Python 3.3+

  • The Python library DendroPy (version 4.0 or higher)

  • The command-line tool Seq-Gen

An easy way to get Python working on your computer is to install the free Anaconda distribution.

Installation

For most users, the easiest way is probably to install the latest version hosted on PyPI:

$ pip install predsim

The project is hosted at https://github.com/jmenglund/predsim and can also be installed using git:

$ git clone https://github.com/jmenglund/predsim.git
$ cd predsim
$ python setup.py install

You may consider installing predsim and its required Python packages within a virtual environment in order to avoid cluttering your system’s Python path.

Usage

$ predsim --help
usage: predsim [-h] [-V] [-l N] [-f #A #C #G #T] [-g N] [-s N] [-n N]
               [-o {nexus,phylip}] [-p FILE] [--seeds-file FILE]
               [--commands-file FILE] [--trees-file FILE]
               pfile tfile

A command-line utility that reads posterior output of MrBayes and simulates
predictive datasets with Seq-Gen.

positional arguments:
  pfile                 path to a MrBayes p-file
  tfile                 path to a MrBayes t-file

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -l N, --length N      sequence lenght (default: 1000)
  -f #A #C #G #T, --freqs #A #C #G #T
                        base frequences (overrides any base frequences in
                        MrBayes' output)
  -g N, --gamma-cats N  number of gamma rate categories (default: continuous)
  -s N, --skip N        number of records (trees) to skip at the beginning of
                        the sample (default: 0)
  -n N, --num-records N
                        number of records (trees) to use in the simulation
  -o {nexus,phylip}, --out-format {nexus,phylip}
                        output format (default: "nexus")
  -p FILE, --seqgen-path FILE
                        path to a Seq-Gen executable (default: "seq-gen")
  --seeds-file FILE     path to file with seed numbers (e.g. for debugging
                        purposes)
  --commands-file FILE  path to output file with commands used by Seq-Gen
  --trees-file FILE     path to output file with trees used by Seq-Gen
  • If base frequences are missing from MrBayes’ output, these must be set manually with the -f (or --freqs) flag.

  • It is recommended that you use the --commands-file and --trees-file flags to check the input given to Seq-Gen.

Running the tests

Testing is carried out with pytest:

$ pytest test_predsim.py

Test coverage can be calculated with Coverage.py using the following commands:

$ coverage run -m pytest test_predsim.py
$ coverage report -m predsim.py

The code follow style conventions in PEP 8, which can be checked with pycodestyle:

$ pycodestyle predsim.py test_predsim.py setup.py

License

predsim is distributed under the MIT license.

Citing

If you use results produced with this package in a scientific publication, please just mention the package name in the text and cite the Zenodo DOI of this project:

DOI-URI

You can select your preferred citation style in the “Cite as” section on the Zenodo page.

predsim relies on other software that also should be cited. Below are suggested citations for Seq-Gen and DendroPy:

  • Rambaut A, Grassly NC. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13:235–238. DOI: 10.1093/bioinformatics/13.3.235

  • Sukumaran J, Holder MT. 2010. DendroPy: a Python library for phylogenetic computing. Bioinformatics 26:1569–1571. DOI: 10.1093/bioinformatics/btq228

Author

Markus Englund, orcid.org/0000-0003-1688-7112

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

predsim-0.7.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

predsim-0.7.0-py3-none-any.whl (8.5 kB view details)

Uploaded Python 3

File details

Details for the file predsim-0.7.0.tar.gz.

File metadata

  • Download URL: predsim-0.7.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for predsim-0.7.0.tar.gz
Algorithm Hash digest
SHA256 f4c16f1cf320bf060b461888bee6c7ec7f6cca4ad1492cde04e79fb22d06d6ec
MD5 7b0732f5b3194926d9db0b85a0c6d8b8
BLAKE2b-256 2951e1d9265afa85ea44a8be80520f542bfc1313b2478274a1a1f06552c9f50f

See more details on using hashes here.

File details

Details for the file predsim-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: predsim-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7

File hashes

Hashes for predsim-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4bc49dc9c092dc4d30262f0ce96be70a42f617cb11a834d9575eec11c58688e
MD5 757136aaeffc7998927e5ea7f47b36ad
BLAKE2b-256 de2f2b0b88764bfb329d3af553e12bbb6c1f5f122ac43ab37bca1bd35bd8cb53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page