An embedding-based phage protein annotation tool by hierarchical assignment
Project description
EmPATHi
Embedding-based Phage Protein Annotation Tool by Hierarchical assignment
Table of Contents
About the Project
Little description.
Preprint can be found at: [link]
Getting Started
EmPATHi has been packaged in PyPI and as an Apptainer container for ease of use.
The source code can also be downloaded from HuggingFace.
Prerequisites
The full list of dependencies and versions we tested to be compatible can be found in requirements.txt. Dependencies are taken care of by pip and Apptainer. See instructions below.
python/3.11.5
joblib==1.2.0
numpy==1.26.4
pandas==2.2.1
matplotlib==3.9.0
torch==2.3.0
scipy==1.13.1
scikit-learn==1.5.0
transformers==4.43.1
sentencepiece==0.2.0
seaborn==0.13.2
The models used by EmPATHi must be obtained seperately. See instructions below.
The models folder for EmPATHi must be obtained from HuggingFace.
ProtT5 must also be downloaded from HuggingFace.
Installation
First, create a virtual environement in python 3.11.5. This can be done using tools such as conda and virtualenv.
Download models for EmPATHi and ProtT5:
git lfs install
git clone https://huggingface.co/AlexandreBoulay/EmPATHi
git clone https://huggingface.co/Rostlab/prot_t5_xl_half_uniref50-enc Rostlab/prot_t5_xl_half_uniref50-enc
export PATH="/path/to/EmPATHi/models:$PATH"
export PATH="/path/to/Rostlab/prot_t5_xl_half_uniref50-enc:$PATH"
1. PIP
pip install empathi
2. Apptainer
3. From source code
Clone the repo if it isn't already done:
git lfs install
git clone https://huggingface.co/AlexandreBoulay/EmPATHi
Install dependencies:
cd EmPATHi
pip install -r requirements.txt
Usage
For pip:
python
from empathi import empathi
empathi(input_file, name, output_folder="path/to/output")
For Apptainer:
From command line:
python src/empathi/empathi.py -h
Options:
- input_file: Path to input file containing protein sequencs (.fa*) or protein embeddings (.pkl/.csv).
- name: Name of file you want to save to (wOut extension). Should be different between runs to avoid overwriting files.
- --models_folder: Path to folder containing EmPATHi models. Can be left unspecified if it was added to PATH earlier.
- --only_embeddings: Whether to only calculate embeddings (no functional prediction).
- --output_folder: Path to the output folder. Default is ./empathi_out/.
- --mode: Which types of proteins you want to predict. Accepted arguments are "all", "pvp", "rbp", "lysin", "regulator"...
When launching from python omit the '--' in front of args.
Contact
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file empathi-1.0.0.tar.gz.
File metadata
- Download URL: empathi-1.0.0.tar.gz
- Upload date:
- Size: 202.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d4c27fc3f1dccf5c44d98a04cf84510f53654c340f8b1d09adcc28685489e0c
|
|
| MD5 |
be2499243b148771ac562abf4165b512
|
|
| BLAKE2b-256 |
0695162524201c0016d9ec4f1d217237c77479cbf13609f7bfb4d638a726ba30
|
File details
Details for the file empathi-1.0.0-py3-none-any.whl.
File metadata
- Download URL: empathi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d02ad3ef5f1219129c4c7b189c2637d92c690979be18a313bca452d112c65963
|
|
| MD5 |
3f3d7800c4e292b7de87b0a8350c262b
|
|
| BLAKE2b-256 |
ff1c7bdf4fbc297a6a3b5ea7a1317a067e6a1d523881538998a87f391afdc2ba
|