Skip to main content

Universal ortholog based Phylogenomic toolkit.

Project description

phyca: phylogeny and collinearity aware assembly evaluation toolkit.

phyca is built around Compleasm utilizing the NCBI Genome database. For a query assembly, phyca improves the precision of BUSCO/Compleasm annotations by up to 7%, makes syntenic comparisons to public reference genomes and rapidly places the assembly on a broad, precomputed phylogeny.

Installation

pip install phyca

phyca is distributed through PyPI and github. A working installation of Compleasm (including SEPP and pplacer) is necessary to avail all functionality. I recommend creating a conda environment to install Compleasm first and installing phyca in that environment, e.g.,

# create environment
conda create -n phyca python=3.9.19
# install compleasm
conda install bioconda::compleasm=0.2.6
# install phyca
pip install phyca

Note that as of 02/03/2025, there is a known issue with pplacer and SEPP on Debian-based systems. A working solution is provided here.

phyca has the following nonexhaustive dependency structure.

Python (tested with 3.9.19)
↓
│───numpy (tested with 2.0.2)
│───pandas (tested with 2.2.3)
|───matplotlib (tested with 3.9.4)
│───seaborn (tested with 0.13.2)
│───SciPy (tested with 1.13.1)
|───BioNick (tested with 0.0.7)
└───Compleasm (tested with 0.2.6)
        |─── hmmer (tested with 3.1b2)
        |─── miniprot (tested with 0.13-r248)
        |      └─── libgcc (tested with 14.2.0 under conda)
        └─── SEPP (tested with 4.4.0)
               └─── pplacer and guppy (v1.1.alpha19-0-g807f6f3) 

Usage

phyca supports 10 BUSCO lineages: viridiplantae, liliopsida, eudicots, chlorophyta, fungi, ascomycota, basidiomycota, metazoa, arthropoda and vertebrata.

A simple run on a query assembly, would be:

phyca -a <assembly file> -l <lineage>

The Compleasm output folder can also be used as input if compleasm output was previously generated:

phyca -c <compleasm_direcoty> -l <lineage>

The above run will output BUSCO, CUSCO (Curated USCOs with higher precision) and MUSCO (remaining USCOs) statistics and graphs. It will compare the query to chromosome level genome assemblies from NCBI genome and output a table with a measure of synteny against each genome. It will output a Neighbor-Joining tree based on BUSCO synteny. Finally, it will place the assembly on a large precomputed phylogeny for the lineage and graph the observed decay in BUSCO synteny against inferred phylogenetic distance.

phyca can also be used to compute the syntenic distance between two assemblies with the -s flag.

phyca -l <lineage> -s -a <assembly1> -r <assembly2>

The same comparison can be done by pointing to the compleasm output directoreis, if already available.

phyca -l <lineage> -s -c <assembly1_compdir> -m <assembly2_compdir>

UniPhyDB

The bulk data used by phyca is hosted by AGI's AVA cluster. Precomputed trees and more information is available on https://UniPhydb.github.io/ .

Example Output

USCO graph:

Synteny decay plot:

Placement tree snippet:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phyca-0.0.1.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phyca-0.0.1-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file phyca-0.0.1.tar.gz.

File metadata

  • Download URL: phyca-0.0.1.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for phyca-0.0.1.tar.gz
Algorithm Hash digest
SHA256 e04665265a333fe47941dfdf35cc1f1887a4b8c49ea1381c799ae94a52b8713b
MD5 b0942e02b330f51b1c3cc2570b0cac4f
BLAKE2b-256 9a9effa7202043eee3b7a8a8a10bfa4e07a587c8e47faeab36dd53c07bb6e5b9

See more details on using hashes here.

File details

Details for the file phyca-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: phyca-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for phyca-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 25449f539b353a4b486b2b643c9948a50375e2ed2b820fba99b02816d4ad4705
MD5 3c5d429ede152140021c8c03d146c4f5
BLAKE2b-256 27c2d9162ed8315356118d0ad80d4097f3e27628ad266dc20ac5d2277f31987e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page