Skip to main content

Quality control for phylogenetic pipelines using pytest

Project description

Phytest logo

pypi badge tests badge coverage badge docs badge black badge pre-commit badge doi badge

Phytest: Quality Control for Phylogenetic Analyses.


Documentation: https://phytest-devs.github.io/phytest

Code: https://github.com/phytest-devs/phytest

Tutorials: https://github.com/phytest-devs?q=example


Installation

Install phytest using pip:

pip install phytest

Quick Start

Phytest is a tool for automating quality control checks on sequence, tree and metadata files during phylogenetic analyses. Phytest ensures that phylogenetic analyses meet user-defined quality control tests.

Here we will create example data files to run our tests on.

Create an alignment fasta file example.fasta

>Sequence_A
ATGAGATCCCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_B
ATGAGATCCCCGATAGCGAGCTAGXGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_C
ATGAGA--CCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_D
ATGAGATCCCCGATAGCGAGCTAGCGATNNNNNNNNNNNNNNNNNTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG

Create a tree newick file example.tree

(Sequence_A:1,Sequence_B:0.2,(Sequence_C:0.3,Sequence_D:0.4):0.5);

Writing a test file

We want to enforce the follow constraints on our data:
  1. The alignment has 4 sequences

  2. The sequences have a length of 100

  3. The sequences only contains the characters A, T, G, C, N and -

  4. The sequences are allowed to only contain single base deletions

  5. The longest stretch of Ns is 10

  6. The tree has 4 tips

  7. The tree is bifurcating

  8. The alignment and tree have the same names

  9. All internal branches are longer than the given threshold

  10. There are no outlier branches in the tree

We can write these tests in a python files example.py

from phytest import Alignment, Sequence, Tree


def test_alignment_has_4_sequences(alignment: Alignment):
    alignment.assert_length(4)


def test_alignment_has_a_width_of_100(alignment: Alignment):
    alignment.assert_width(100)


def test_sequences_only_contains_the_characters(sequence: Sequence):
    sequence.assert_valid_alphabet(alphabet="ATGCN-")


def test_single_base_deletions(sequence: Sequence):
    sequence.assert_longest_stretch_gaps(max=1)


def test_longest_stretch_of_Ns_is_10(sequence: Sequence):
    sequence.assert_longest_stretch_Ns(max=10)


def test_tree_has_4_tips(tree: Tree):
    tree.assert_number_of_tips(4)


def test_tree_is_bifurcating(tree: Tree):
    tree.assert_is_bifurcating()


def test_aln_tree_match_names(alignment: Alignment, tree: Tree):
    aln_names = [i.name for i in alignment]
    tree.assert_tip_names(aln_names)


def test_all_internal_branches_lengths_above_threshold(tree: Tree, threshold=1e-4):
    tree.assert_internal_branch_lengths(min=threshold)


def test_outlier_branches(tree: Tree):
    # Here we create a custom function to detect outliers
    import statistics

    tips = tree.get_terminals()
    branch_lengths = [t.branch_length for t in tips]
    cut_off = statistics.mean(branch_lengths) + statistics.stdev(branch_lengths)
    for tip in tips:
        assert tip.branch_length < cut_off, f"Outlier tip '{tip.name}' (branch length = {tip.branch_length})!"

Running Phytest

We can then run these tests on our data with phytest:

phytest examples/example.py -s examples/data/example.fasta -t examples/data/example.tree

Generate a report by adding --report report.html.

HTML Report

From the output we can see several tests failed:

FAILED examples/example.py::test_sequences_only_contains_the_characters[Sequence_B] - AssertionError: Invalid pattern found in 'Sequence_B'!
FAILED examples/example.py::test_single_base_deletions[Sequence_C] - AssertionError: Longest stretch of '-' in 'Sequence_C' > 1!
FAILED examples/example.py::test_longest_stretch_of_Ns_is_10[Sequence_D] - AssertionError: Longest stretch of 'N' in 'Sequence_D' > 10!
FAILED examples/example.py::test_outlier_branches - AssertionError: Outlier tip 'Sequence_A' (branch length = 1.0)!

Results (0.07s):
    15 passed
    4 failed
        - examples/example.py:12 test_sequences_only_contains_the_characters[Sequence_B]
        - examples/example.py:16 test_single_base_deletions[Sequence_C]
        - examples/example.py:20 test_longest_stretch_of_Ns_is_10[Sequence_D]
        - examples/example.py:32 test_outlier_branches

See docs for more information https://phytest-devs.github.io/phytest.

Citation

If you use phytest, please cite the following paper:

Wytamma Wirth, Simon Mutch, Robert Turnbull, Sebastian Duchene, Phytest: quality control for phylogenetic analyses, Bioinformatics, Volume 38, Issue 22, 15 November 2022, Pages 5124–5125, https://doi.org/10.1093/bioinformatics/btac664

@article{10.1093/bioinformatics/btac664,
    author = {Wirth, Wytamma and Mutch, Simon and Turnbull, Robert and Duchene, Sebastian},
    title = "{{Phytest: quality control for phylogenetic analyses}}",
    journal = {Bioinformatics},
    volume = {38},
    number = {22},
    pages = {5124-5125},
    year = {2022},
    month = {10},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac664},
    url = {https://doi.org/10.1093/bioinformatics/btac664},
    eprint = {https://academic.oup.com/bioinformatics/article-pdf/38/22/5124/47153886/btac664.pdf},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phytest-1.4.1.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

phytest-1.4.1-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file phytest-1.4.1.tar.gz.

File metadata

  • Download URL: phytest-1.4.1.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1012-azure

File hashes

Hashes for phytest-1.4.1.tar.gz
Algorithm Hash digest
SHA256 a4083778922020f317efd02474a35dcb822df6451b6e9a6e9350bb1c1ce336a2
MD5 ee485c96cb1b74b19f56815155ea33f0
BLAKE2b-256 d4b7809bc0b7212abc5212e1983228c14b27e6ab5cbc1fcf2fdf719185a1d352

See more details on using hashes here.

File details

Details for the file phytest-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: phytest-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.10.12 Linux/6.2.0-1012-azure

File hashes

Hashes for phytest-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f7539861b627359528e5855ed1a08287ae0d2f501e35a193561713f73e363f56
MD5 e7800cba7bf46e1f9d74e10f7f7d8255
BLAKE2b-256 1e330f3981c640c2de4944fdc84c87c07012d0c62336cd462f9aa61d6d60c93d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page