Skip to main content

An embedding-based phage protein annotation tool by hierarchical assignment

Project description

Empathi
Embedding-based Phage Protein Annotation Tool by Hierarchical Assignment

Table of Contents
  1. About the Project
  2. Getting Started
  3. Usage details

About the Project

Empathi is a tool for the prediction of bacteriophage protein functions. It utilizes the highly informative ProtT5 protein embeddings to make predictions. In addition, new functional groups were defined to be better suited for machine-learning than the often-overlapping PHROG categories.

A preprint is available here.

Getting Started

Empathi has been packaged in PyPI and as an Apptainer container for ease of use.
The source code can also be downloaded from HuggingFace.

Prerequisites

The full list of dependencies and versions can be found in requirements.txt. Dependencies are taken care of by pip and Apptainer. See instructions below.

python/3.11.5
joblib==1.2.0
numpy==1.26.4
pandas==2.2.1
torch==2.3.0
scipy==1.13.1
scikit-learn==1.5.0
transformers==4.43.1
sentencepiece==0.2.0

Installation

There are three ways of installing Empathi: through PyPI, as an Apptainer container or as source code.

1. PIP

First, create a virtual environement in python 3.11.5.

conda create -n empathi_env python=3.11.5
conda activate empathi_env

Download models for Empathi:

git lfs install
git clone https://huggingface.co/AlexandreBoulay/empathi
export PATH="/path/to/empathi/models:$PATH"

Install dependencies:

pip install empathi

Usage

python
from empathi import empathi
empathi.empathi("input_file", "name", output_folder="path/to/output")

2. Apptainer

Download Apptainer or singularity. On windows, this will require a virtual machine. WSL works well.

Fetch Empathi from Sylabs:

apptainer pull empathi.sif library://alexandreboulay/empathi/empathi

Launch Empathi

apptainer run empathi.sif path/to/input_file name

3. From source code

First, create a virtual environement in python 3.11.5.

conda create -n empathi_env python=3.11.5
conda activate empathi_env

Clone the repo.

git lfs install
git clone https://huggingface.co/AlexandreBoulay/empathi

Install dependencies:

cd empathi
pip install -r requirements.txt

Usage

python src/empathi/empathi.py input_file name

Usage details

A fasta file of protein sequences or a csv file of protein embeddings can be used as input.

Specifying the option --only_embeddings will only compute embeddings. This step is much faster with a GPU. The embeddings file can then be reinputted using the same command (without --only_embeddings) and specifying the new file as input file.

Options:

  • input_file: Path to input file containing protein sequencs (.fa*) or protein embeddings (.pkl/.csv).
  • name: Name of file you want to save to (wOut extension). Should be different between runs to avoid overwriting files.
  • --models_folder: Path to folder containing EmPATHi models. Can be left unspecified if it was added to PATH earlier.
  • --only_embeddings: Whether to only calculate embeddings (no functional prediction).
  • --output_folder: Path to the output folder. Default is ./empathi_out/.
  • --mode: Which types of proteins you want to predict. Accepted arguments are "all", "pvp", "rbp", "lysin", "regulator"...

When launching from python omit the '--' in front of args.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empathi-1.0.2.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

empathi-1.0.2-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file empathi-1.0.2.tar.gz.

File metadata

  • Download URL: empathi-1.0.2.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for empathi-1.0.2.tar.gz
Algorithm Hash digest
SHA256 e43e1f5195d6715cdf20f8cbefc5392f2534080c1f391b20609181f82e641822
MD5 f81ac286cde890dc93f8ddf0e9f2f95e
BLAKE2b-256 6483c2e3796bcf1b78b94e368063c8990bafa5f2c552d9ae549f510d8c3d68f4

See more details on using hashes here.

File details

Details for the file empathi-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: empathi-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for empathi-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 86f9ec264be405c6cf17dd84f39d100e42e2c6b1323d37a4579c786ed2dcd81c
MD5 ebde0840051f98d5017c1d7afe03dfc8
BLAKE2b-256 497f52b221b123a0c7c220164e1712a58982daa2b2d888b17930b421536bae2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page