Universal ortholog based Phylogenomic toolkit.
Project description
phyca: phylogeny and collinearity aware assembly evaluation toolkit.
phyca is built around Compleasm utilizing the NCBI Genome database. For a query assembly, phyca improves the precision of BUSCO/Compleasm annotations by up to 7%, makes syntenic comparisons to public reference genomes and rapidly places the assembly on a broad, precomputed phylogeny.
Installation
pip install phyca
phyca is distributed through PyPI and github. A working installation of Compleasm (including SEPP and pplacer) is necessary to avail all functionality. I recommend creating a conda environment to install Compleasm first and installing phyca in that environment, e.g.,
# create environment
conda create -n phyca python=3.9.19
# install compleasm
conda install bioconda::compleasm=0.2.6
# install phyca
pip install phyca
Note that as of 02/03/2025, there is a known issue with pplacer and SEPP on Debian-based systems. A working solution is provided here.
phyca has the following nonexhaustive dependency structure.
Python (tested with 3.9.19)
↓
│───numpy (tested with 2.0.2)
│───pandas (tested with 2.2.3)
|───matplotlib (tested with 3.9.4)
│───seaborn (tested with 0.13.2)
│───SciPy (tested with 1.13.1)
|───BioNick (tested with 0.0.7)
└───Compleasm (tested with 0.2.6)
|─── hmmer (tested with 3.1b2)
|─── miniprot (tested with 0.13-r248)
| └─── libgcc (tested with 14.2.0 under conda)
└─── SEPP (tested with 4.4.0)
└─── pplacer and guppy (v1.1.alpha19-0-g807f6f3)
Usage
phyca supports 10 BUSCO lineages: viridiplantae, liliopsida, eudicots, chlorophyta, fungi, ascomycota, basidiomycota, metazoa, arthropoda and vertebrata.
A simple run on a query assembly, would be:
phyca -a <assembly file> -l <lineage>
The Compleasm output folder can also be used as input if compleasm output was previously generated:
phyca -c <compleasm_direcoty> -l <lineage>
The above run will output BUSCO, CUSCO (Curated USCOs with higher precision) and MUSCO (remaining USCOs) statistics and graphs. It will compare the query to chromosome level genome assemblies from NCBI genome and output a table with a measure of synteny against each genome. It will output a Neighbor-Joining tree based on BUSCO synteny. Finally, it will place the assembly on a large precomputed phylogeny for the lineage and graph the observed decay in BUSCO synteny against inferred phylogenetic distance.
phyca can also be used to compute the syntenic distance between two assemblies with the -s flag.
phyca -l <lineage> -s -a <assembly1> -r <assembly2>
The same comparison can be done by pointing to the compleasm output directoreis, if already available.
phyca -l <lineage> -s -c <assembly1_compdir> -m <assembly2_compdir>
UniPhyDB
The bulk data used by phyca is hosted by AGI's AVA cluster. Precomputed trees and more information is available on https://UniPhydb.github.io/ .
Example Output
USCO graph:
Synteny decay plot:
Placement tree snippet:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phyca-0.0.2.tar.gz.
File metadata
- Download URL: phyca-0.0.2.tar.gz
- Upload date:
- Size: 69.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4839065c5c8dfa68056d7acb2e628f5ced3e62a4b1cc5626ccabd03a5ec11d25
|
|
| MD5 |
ee62d80e914fc1a3efe1844c77063e60
|
|
| BLAKE2b-256 |
2e0b2d13a7667ffa6a1dc886d4710ef58038985d4c5111ed40220a7af0aefb08
|
File details
Details for the file phyca-0.0.2-py3-none-any.whl.
File metadata
- Download URL: phyca-0.0.2-py3-none-any.whl
- Upload date:
- Size: 67.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b90e96f008ddd254e979489dfe5bdb122bcee6e9be67cedc68cf3f75a19d8029
|
|
| MD5 |
d14c0185b021e32bdfd459bae3e1a870
|
|
| BLAKE2b-256 |
e020729c45f0b8ae402988da4c37e855aec7b9447e36f8f6e1fe638b0a721f10
|