Skip to main content

An embedding-based phage protein annotation tool by hierarchical assignment

Project description

EmPATHi
Embedding-based Phage Protein Annotation Tool by Hierarchical assignment

Table of Contents
  1. About the Project
  2. Getting Started
  3. Usage
  4. Contact

About the Project

Little description.

Preprint can be found at: [link]

Getting Started

EmPATHi has been packaged in PyPI and as an Apptainer container for ease of use.
The source code can also be downloaded from HuggingFace.

Prerequisites

The full list of dependencies and versions we tested to be compatible can be found in requirements.txt. Dependencies are taken care of by pip and Apptainer. See instructions below.

python/3.11.5
joblib==1.2.0
numpy==1.26.4
pandas==2.2.1
matplotlib==3.9.0
torch==2.3.0
scipy==1.13.1
scikit-learn==1.5.0
transformers==4.43.1
sentencepiece==0.2.0
seaborn==0.13.2

The models used by EmPATHi must be obtained seperately. See instructions below.
The models folder for EmPATHi must be obtained from HuggingFace.
ProtT5 must also be downloaded from HuggingFace.

Installation

First, create a virtual environement in python 3.11.5. This can be done using tools such as conda and virtualenv.

Download models for EmPATHi and ProtT5:

git lfs install
git clone https://huggingface.co/AlexandreBoulay/EmPATHi
git clone https://huggingface.co/Rostlab/prot_t5_xl_half_uniref50-enc Rostlab/prot_t5_xl_half_uniref50-enc
export PATH="/path/to/EmPATHi/models:$PATH"
export PATH="/path/to/Rostlab/prot_t5_xl_half_uniref50-enc:$PATH"

1. PIP

pip install empathi

2. Apptainer

3. From source code

Clone the repo if it isn't already done:

git lfs install
git clone https://huggingface.co/AlexandreBoulay/EmPATHi

Install dependencies:

cd EmPATHi
pip install -r requirements.txt

Usage

For pip:

python
from empathi import empathi
empathi(input_file, name, output_folder="path/to/output")

For Apptainer:

From command line:

python src/empathi/empathi.py -h

Options:

  • input_file: Path to input file containing protein sequencs (.fa*) or protein embeddings (.pkl/.csv).
  • name: Name of file you want to save to (wOut extension). Should be different between runs to avoid overwriting files.
  • --models_folder: Path to folder containing EmPATHi models. Can be left unspecified if it was added to PATH earlier.
  • --only_embeddings: Whether to only calculate embeddings (no functional prediction).
  • --output_folder: Path to the output folder. Default is ./empathi_out/.
  • --mode: Which types of proteins you want to predict. Accepted arguments are "all", "pvp", "rbp", "lysin", "regulator"...

When launching from python omit the '--' in front of args.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empathi-1.0.0.tar.gz (202.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

empathi-1.0.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file empathi-1.0.0.tar.gz.

File metadata

  • Download URL: empathi-1.0.0.tar.gz
  • Upload date:
  • Size: 202.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.9

File hashes

Hashes for empathi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3d4c27fc3f1dccf5c44d98a04cf84510f53654c340f8b1d09adcc28685489e0c
MD5 be2499243b148771ac562abf4165b512
BLAKE2b-256 0695162524201c0016d9ec4f1d217237c77479cbf13609f7bfb4d638a726ba30

See more details on using hashes here.

File details

Details for the file empathi-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: empathi-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.9

File hashes

Hashes for empathi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d02ad3ef5f1219129c4c7b189c2637d92c690979be18a313bca452d112c65963
MD5 3f3d7800c4e292b7de87b0a8350c262b
BLAKE2b-256 ff1c7bdf4fbc297a6a3b5ea7a1317a067e6a1d523881538998a87f391afdc2ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page