An embedding-based phage protein annotation tool by hierarchical assignment
Project description
Empathi
Embedding-based Phage Protein Annotation Tool by Hierarchical Assignment
Table of Contents
About the Project
Empathi is a tool for the prediction of bacteriophage protein functions. It utilizes the highly informative ProtT5 protein embeddings to make predictions. In addition, new functional groups were defined to be better suited for machine-learning than the often-overlapping PHROG categories.
A preprint is available here.
Getting Started
Empathi has been packaged in PyPI and as an
Apptainer container for ease of use.
The source code can also be downloaded from HuggingFace.
Prerequisites
The full list of dependencies and versions can be found in requirements.txt. Dependencies are taken care of by pip and Apptainer. See instructions below.
python/3.11.5
joblib==1.2.0
numpy==1.26.4
pandas==2.2.1
torch==2.3.0
scipy==1.13.1
scikit-learn==1.5.0
transformers==4.43.1
sentencepiece==0.2.0
Installation
There are three ways of installing Empathi: through PyPI, as an Apptainer container or as source code.
1. PIP
First, create a virtual environement in python 3.11.5.
conda create -n empathi_env python=3.11.5
conda activate empathi_env
Download models for Empathi:
git lfs install
git clone https://huggingface.co/AlexandreBoulay/empathi
export PATH="/path/to/empathi/models:$PATH"
Install dependencies:
pip install empathi
Usage
python
from empathi import empathi
empathi.empathi("input_file", "name", output_folder="path/to/output")
2. Apptainer
Download Apptainer or singularity. On windows, this will require a virtual machine. WSL works well.
Fetch Empathi from Sylabs:
apptainer pull empathi.sif library://alexandreboulay/empathi/empathi
Launch Empathi
apptainer run empathi.sif path/to/input_file name
3. From source code
First, create a virtual environement in python 3.11.5.
conda create -n empathi_env python=3.11.5
conda activate empathi_env
Clone the repo.
git lfs install
git clone https://huggingface.co/AlexandreBoulay/empathi
Install dependencies:
cd empathi
pip install -r requirements.txt
Usage
python src/empathi/empathi.py input_file name
Usage details
A fasta file of protein sequences or a csv file of protein embeddings can be used as input.
Specifying the option --only_embeddings will only compute embeddings. This step is much faster with a GPU. The embeddings file can then be reinputted using the same command (without --only_embeddings) and specifying the new file as input file.
Options:
- input_file: Path to input file containing protein sequencs (.fa*) or protein embeddings (.pkl/.csv).
- name: Name of file you want to save to (wOut extension). Should be different between runs to avoid overwriting files.
- --models_folder: Path to folder containing EmPATHi models. Can be left unspecified if it was added to PATH earlier.
- --only_embeddings: Whether to only calculate embeddings (no functional prediction).
- --output_folder: Path to the output folder. Default is ./empathi_out/.
- --mode: Which types of proteins you want to predict. Accepted arguments are "all", "pvp", "rbp", "lysin", "regulator"...
When launching from python omit the '--' in front of args.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file empathi-1.0.2.tar.gz.
File metadata
- Download URL: empathi-1.0.2.tar.gz
- Upload date:
- Size: 31.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e43e1f5195d6715cdf20f8cbefc5392f2534080c1f391b20609181f82e641822
|
|
| MD5 |
f81ac286cde890dc93f8ddf0e9f2f95e
|
|
| BLAKE2b-256 |
6483c2e3796bcf1b78b94e368063c8990bafa5f2c552d9ae549f510d8c3d68f4
|
File details
Details for the file empathi-1.0.2-py3-none-any.whl.
File metadata
- Download URL: empathi-1.0.2-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86f9ec264be405c6cf17dd84f39d100e42e2c6b1323d37a4579c786ed2dcd81c
|
|
| MD5 |
ebde0840051f98d5017c1d7afe03dfc8
|
|
| BLAKE2b-256 |
497f52b221b123a0c7c220164e1712a58982daa2b2d888b17930b421536bae2a
|