Skip to main content

Universal ortholog based Phylogenomic toolkit.

Project description

phyca: phylogeny and collinearity aware assembly evaluation toolkit.

phyca is built around Compleasm utilizing the NCBI Genome database. For a query assembly, phyca improves the precision of BUSCO/Compleasm annotations by up to 7%, makes syntenic comparisons to public reference genomes and rapidly places the assembly on a broad, precomputed phylogeny.

Installation

pip install phyca

phyca is distributed through PyPI and github. A working installation of Compleasm (including SEPP and pplacer) is necessary to avail all functionality. I recommend creating a conda environment to install Compleasm first and installing phyca in that environment, e.g.,

# create environment
conda create -n phyca python=3.9.19
# install compleasm
conda install bioconda::compleasm=0.2.6
# install phyca
pip install phyca

Note that as of 02/03/2025, there is a known issue with pplacer and SEPP on Debian-based systems. A working solution is provided here.

phyca has the following nonexhaustive dependency structure.

Python (tested with 3.9.19)
↓
│───numpy (tested with 2.0.2)
│───pandas (tested with 2.2.3)
|───matplotlib (tested with 3.9.4)
│───seaborn (tested with 0.13.2)
│───SciPy (tested with 1.13.1)
|───BioNick (tested with 0.0.7)
└───Compleasm (tested with 0.2.6)
        |─── hmmer (tested with 3.1b2)
        |─── miniprot (tested with 0.13-r248)
        |      └─── libgcc (tested with 14.2.0 under conda)
        └─── SEPP (tested with 4.4.0)
               └─── pplacer and guppy (v1.1.alpha19-0-g807f6f3) 

Usage

phyca supports 10 BUSCO lineages: viridiplantae, liliopsida, eudicots, chlorophyta, fungi, ascomycota, basidiomycota, metazoa, arthropoda and vertebrata.

A simple run on a query assembly, would be:

phyca -a <assembly file> -l <lineage>

The Compleasm output folder can also be used as input if compleasm output was previously generated:

phyca -c <compleasm_direcoty> -l <lineage>

The above run will output BUSCO, CUSCO (Curated USCOs with higher precision) and MUSCO (remaining USCOs) statistics and graphs. It will compare the query to chromosome level genome assemblies from NCBI genome and output a table with a measure of synteny against each genome. It will output a Neighbor-Joining tree based on BUSCO synteny. Finally, it will place the assembly on a large precomputed phylogeny for the lineage and graph the observed decay in BUSCO synteny against inferred phylogenetic distance.

phyca can also be used to compute the syntenic distance between two assemblies with the -s flag.

phyca -l <lineage> -s -a <assembly1> -r <assembly2>

The same comparison can be done by pointing to the compleasm output directoreis, if already available.

phyca -l <lineage> -s -c <assembly1_compdir> -m <assembly2_compdir>

UniPhyDB

The bulk data used by phyca is hosted by AGI's AVA cluster. Precomputed trees and more information is available on https://UniPhydb.github.io/ .

Example Output

USCO graph:

Synteny decay plot:

Placement tree snippet:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phyca-0.0.2.tar.gz (69.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phyca-0.0.2-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file phyca-0.0.2.tar.gz.

File metadata

  • Download URL: phyca-0.0.2.tar.gz
  • Upload date:
  • Size: 69.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for phyca-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4839065c5c8dfa68056d7acb2e628f5ced3e62a4b1cc5626ccabd03a5ec11d25
MD5 ee62d80e914fc1a3efe1844c77063e60
BLAKE2b-256 2e0b2d13a7667ffa6a1dc886d4710ef58038985d4c5111ed40220a7af0aefb08

See more details on using hashes here.

File details

Details for the file phyca-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: phyca-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for phyca-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b90e96f008ddd254e979489dfe5bdb122bcee6e9be67cedc68cf3f75a19d8029
MD5 d14c0185b021e32bdfd459bae3e1a870
BLAKE2b-256 e020729c45f0b8ae402988da4c37e855aec7b9447e36f8f6e1fe638b0a721f10

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page