Functional ANnoTAtion based on embedding space SImilArity
Project description
FANTASIA
Functional ANnoTAtion based on embedding space SImilArity
FANTASIA is an advanced pipeline designed for automatic functional annotation of protein sequences using state-of-the-art protein language models. It integrates deep learning embeddings and similarity searches in vector databases to associate Gene Ontology (GO) terms with proteins.
For full documentation, visit FANTASIA Documentation.
Key Features
-
✅ Advanced Embedding Models
Supports protein language models: ProtT5, ProstT5, and ESM2 for sequence representation. -
🔍 Redundancy Filtering
Filters out homologous sequences using CD-HIT, allowing controlled redundancy levels through an adjustable threshold, ensuring reliable benchmarking and evaluation. -
💾 Optimized Data Storage
Embeddings are stored in HDF5 format for input sequences, while similarity lookups are performed in a vector database (pgvector in PostgreSQL) for fast retrieval. -
🚀 Efficient Similarity Lookup
Performs high-speed searches using pgvector, enabling accurate annotation based on embedding similarity. -
🔬 Functional Annotation by Similarity
Assigns Gene Ontology (GO) terms to proteins based on embedding space similarity, leveraging pre-trained embeddings.
Pipeline Overview (Simplified)
-
Embedding Generation
Computes protein embeddings using deep learning models (ProtT5, ProstT5, and ESM2). -
GO Term Lookup
Uses vector similarity searches in pgvector to assign Gene Ontology terms based on embedding similarity.
Acknowledgments
FANTASIA is the result of a collaborative effort between Ana Roja’s Lab (Andalusian Center for Developmental Biology, CSIC) and Rosa Fernández’s Lab (Metazoa Phylogenomics Lab, Institute of Evolutionary Biology, CSIC-UPF). This project demonstrates the synergy between research teams with diverse expertise.
This version of FANTASIA builds upon previous work from:
-
Metazoa Phylogenomics Lab's FANTASIA
The original implementation of FANTASIA for functional annotation. -
bio_embeddings
A state-of-the-art framework for generating protein sequence embeddings. -
GoPredSim
A similarity-based approach for Gene Ontology annotation. -
protein-metamorphisms-is
Serves as the reference biological information system, providing a robust data model and curated datasets for protein structural and functional analysis.
We also extend our gratitude to LifeHUB-CSIC for inspiring this initiative and fostering innovation in computational biology.
Citing FANTASIA
If you use FANTASIA in your research, please cite the following publications:
-
Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024).
Illuminating the functional landscape of the dark proteome across the Animal Tree of Life.
DOI: 10.1101/2024.02.28.582465 -
Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R., & Rojas, A. M. (2024).
Decoding proteome functional information in model organisms using protein language models.
DOI: 10.1101/2024.02.14.580341
Contact
For inquiries, please contact the project team:
- Francisco Miguel Pérez Canales: fmpercan@upo.es (Developer)
- Gemma I. Martínez-Redondo: gemma.martinez@ibe.upf-csic.es
- Ana M. Rojas: a.rojas.m@csic.es
- Rosa Fernández: rosa.fernandez@ibe.upf-csic.es
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fantasia-0.13.3.tar.gz.
File metadata
- Download URL: fantasia-0.13.3.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c8bdb27012ffc57ab80895331ad4144ab189c4338ab1885860316f1bbada886
|
|
| MD5 |
9d930f5b0025a8f8175f8dab598a9f9a
|
|
| BLAKE2b-256 |
6a4b68a4de8b36c52ea3a3f9281bd6723867e79d6d5e78f1e07b5d3a835e778d
|
File details
Details for the file fantasia-0.13.3-py3-none-any.whl.
File metadata
- Download URL: fantasia-0.13.3-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.8.0-1021-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e76d1634157dd096912cc18f92296abc2da63c2a3bfcc57e9a117259ea55ff59
|
|
| MD5 |
9e42ecfc18de284bdaf2a9147b2cef4c
|
|
| BLAKE2b-256 |
b29b364d1ad3eebb41912b51cf998a1fee7d16beeb15e0178eaf6a82ee639de0
|