Quality control for phylogenetic pipelines using pytest
Project description
Phytest: Quality Control for Phylogenetic Analyses.
Documentation: https://phytest-devs.github.io/phytest
Code: https://github.com/phytest-devs/phytest
Installation
Install phytest using pip:
pip install phytest
Usage
Phytest is a tool for automating quality control checks on sequence, tree and metadata files during phylogenetic analyses. Phytest ensures that phylogenetic analyses meet user-defined quality control tests.
Here we will create example data files to run our tests on.
Create an alignment fasta file example.fasta
>Sequence_A
ATGAGATCCCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_B
ATGAGATCCCCGATAGCGAGCTAGXGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_C
ATGAGA--CCCGATAGCGAGCTAGCGATCGCAGCGACTCAGCAGCTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
>Sequence_D
ATGAGATCCCCGATAGCGAGCTAGCGATNNNNNNNNNNNNNNNNNTACAGCGCAGAGGAGAGAGAGGCCCCTATTTACTAGAGCTCCAGATATAGNNTAG
Create a tree newick file example.tree
(Sequence_A:1,Sequence_B:0.2,(Sequence_C:0.3,Sequence_D:0.4):0.5);
Writing a test file
- We want to enforce the follow constraints on our data:
The alignment has 4 sequences
The sequences have a length of 100
The sequences only contains the characters A, T, G, C, N and -
The sequences are allowed to only contain single base deletions
The longest stretch of Ns is 10
The tree has 4 tips
The tree is bifurcating
There are no outlier branches in the tree
We can write these tests in a python files example.py
from phytest import Alignment, Sequence, Tree
def test_alignment_has_4_sequences(alignment: Alignment):
alignment.assert_length(4)
def test_alignment_has_a_width_of_100(alignment: Alignment):
alignment.assert_width(100)
def test_sequences_only_contains_the_characters(sequence: Sequence):
sequence.assert_valid_alphabet(alphabet="ATGCN-")
def test_single_base_deletions(sequence: Sequence):
sequence.assert_longest_stretch_gaps(max=1)
def test_longest_stretch_of_Ns_is_10(sequence: Sequence):
sequence.assert_longest_stretch_Ns(max=10)
def test_tree_has_4_tips(tree: Tree):
tree.assert_number_of_tips(4)
def test_tree_is_bifurcating(tree: Tree):
tree.assert_is_bifurcating()
def test_outlier_branches(tree: Tree):
# Here we create a custom function to detect outliers
import statistics
tips = tree.get_terminals()
branch_lengths = [t.branch_length for t in tips]
cut_off = statistics.mean(branch_lengths) + statistics.stdev(branch_lengths)
for tip in tips:
assert tip.branch_length < cut_off, f"Outlier tip '{tip.name}' (branch length = {tip.branch_length})!"
Running Phytest
We can then run these test on our data with phytest
:
phytest examples/example.py -s examples/data/example.fasta -t examples/data/example.tree
Generate a report by adding --report report.html
.
From the output we can see several tests failed:
FAILED examples/example.py::test_sequences_only_contains_the_characters[Sequence_B] - AssertionError: Invalid pattern found in 'Sequence_B'!
FAILED examples/example.py::test_single_base_deletions[Sequence_C] - AssertionError: Longest stretch of '-' in 'Sequence_C' > 1!
FAILED examples/example.py::test_longest_stretch_of_Ns_is_10[Sequence_D] - AssertionError: Longest stretch of 'N' in 'Sequence_D' > 10!
FAILED examples/example.py::test_outlier_branches - AssertionError: Outlier tip 'Sequence_A' (branch length = 1.0)!
Results (0.07s):
13 passed
4 failed
- examples/example.py:12 test_sequences_only_contains_the_characters[Sequence_B]
- examples/example.py:16 test_single_base_deletions[Sequence_C]
- examples/example.py:20 test_longest_stretch_of_Ns_is_10[Sequence_D]
- examples/example.py:32 test_outlier_branches
See docs for more information https://phytest-devs.github.io/phytest.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.