Skip to main content

Functional ANnoTAtion based on embedding space SImilArity

Project description

FANTASIA Logo

FANTASIA

Functional ANnoTAtion based on embedding space SImilArity

FANTASIA is an advanced pipeline designed for automatic functional annotation of protein sequences using state-of-the-art protein language models. It integrates deep learning embeddings and similarity searches in vector databases to associate Gene Ontology (GO) terms with proteins.

For full documentation, visit FANTASIA Documentation.

Key Features

  • ✅ Advanced Embedding Models
    Supports protein language models: ProtT5, ProstT5, and ESM2 for sequence representation.

  • 🔍 Redundancy Filtering
    Filters out homologous sequences using CD-HIT, allowing controlled redundancy levels through an adjustable threshold, ensuring reliable benchmarking and evaluation.

  • 💾 Optimized Data Storage
    Embeddings are stored in HDF5 format for input sequences, while similarity lookups are performed in a vector database (pgvector in PostgreSQL) for fast retrieval.

  • 🚀 Efficient Similarity Lookup
    Performs high-speed searches using pgvector, enabling accurate annotation based on embedding similarity.

  • 🔬 Functional Annotation by Similarity
    Assigns Gene Ontology (GO) terms to proteins based on sequence and structural similarity, leveraging pre-trained embeddings.

Pipeline Overview (Simplified)

  1. Embedding Generation
    Computes protein embeddings using deep learning models (ProtT5, ProstT5, and ESM2).

  2. GO Term Lookup
    Uses vector similarity searches in pgvector to assign Gene Ontology terms based on embedding similarity.

Acknowledgments

FANTASIA is the result of a collaborative effort between Ana Roja’s Lab (Andalusian Center for Developmental Biology, CSIC) and Rosa Fernández’s Lab (Metazoa Phylogenomics Lab, Institute of Evolutionary Biology, CSIC-UPF). This project demonstrates the synergy between research teams with diverse expertise.

This version of FANTASIA builds upon previous work from:

  • Metazoa Phylogenomics Lab's FANTASIA
    The original implementation of FANTASIA for functional annotation.

  • bio_embeddings
    A state-of-the-art framework for generating protein sequence embeddings.

  • GoPredSim
    A similarity-based approach for Gene Ontology annotation.

  • protein-metamorphisms-is
    Serves as the reference biological information system, providing a robust data model and curated datasets for protein structural and functional analysis.

We also extend our gratitude to LifeHUB-CSIC for inspiring this initiative and fostering innovation in computational biology.

Citing FANTASIA

If you use FANTASIA in your research, please cite the following publications:

  1. Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024).
    Illuminating the functional landscape of the dark proteome across the Animal Tree of Life.
    DOI: 10.1101/2024.02.28.582465

  2. Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R., & Rojas, A. M. (2024).
    Decoding proteome functional information in model organisms using protein language models.
    DOI: 10.1101/2024.02.14.580341

Contact

For inquiries, please contact the project team:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fantasia-0.9.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fantasia-0.9.0-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file fantasia-0.9.0.tar.gz.

File metadata

  • Download URL: fantasia-0.9.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for fantasia-0.9.0.tar.gz
Algorithm Hash digest
SHA256 1f688d072e16f18159c42a23b2663b01cf97267e891f1f5624619dc80c735125
MD5 8c41a537a6a43e6e94974b93c19c01a7
BLAKE2b-256 1f1ce276e88130c237517a5206c14814d9553e8523f1db26118bbe46ab280b3f

See more details on using hashes here.

File details

Details for the file fantasia-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: fantasia-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for fantasia-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e8e1137118d7b9fe5c9a32865309111e15c5a5eebb2c7df653b824a7695d2f25
MD5 d01a40d725baf9222d4d184a6050ea3d
BLAKE2b-256 d02de0ea4a9373e70fa5bbf9722048a4e3689d619d4d577154fb53d80b5bf95d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page