Functional ANnoTAtion based on embedding space SImilArity

These details have not been verified by PyPI

Project description

Linting Status

FANTASIA Logo

FANTASIA

Functional ANnoTAtion based on embedding space SImilArity

FANTASIA is an advanced pipeline for the automatic functional annotation of protein sequences using state-of-the-art protein language models. It integrates deep learning embeddings and in-memory similarity searches, retrieving reference vectors from a PostgreSQL database with pgvector, to associate Gene Ontology (GO) terms with proteins.

For full documentation, visit FANTASIA Documentation.

Key Features

✅ Available Embedding Models
Supports protein language models: ProtT5, ProstT5, and ESM2 for sequence representation.
🔍 Redundancy Filtering
Filters out homologous sequences using CD-HIT in the lookup table, allowing controlled redundancy levels through an adjustable threshold, ensuring reliable benchmarking and evaluation.
💾 Optimized Data Storage
Embeddings are stored in HDF5 format for input sequences. The reference table, however, is hosted in a public relational PostgreSQL database using pgvector.
🚀 Efficient Similarity Lookup
Performs high-speed searches using in-memory computations. Reference vectors are retrieved from a PostgreSQL database with pgvector for comparison.
🔬 Functional Annotation by Similarity
Assigns Gene Ontology (GO) terms to proteins based on embedding space similarity, using pre-trained embeddings.

Pipeline Overview (Simplified)

Embedding Generation
Computes protein embeddings using deep learning models (ProtT5, ProstT5, and ESM2).
GO Term Lookup
Performs vector similarity searches using in-memory computations to assign Gene Ontology terms. Reference embeddings are retrieved from a PostgreSQL database with pgvector. Only experimental evidence codes are used for transfer.

Acknowledgments

FANTASIA is the result of a collaborative effort between Ana Rojas’ Lab (CBBIO) (Andalusian Center for Developmental Biology, CSIC) and Rosa Fernández’s Lab (Metazoa Phylogenomics Lab, Institute of Evolutionary Biology, CSIC-UPF). This project demonstrates the synergy between research teams with diverse expertise.

This version of FANTASIA builds upon previous work from:

Metazoa Phylogenomics Lab's FANTASIA
The original implementation of FANTASIA for functional annotation.
bio_embeddings
A state-of-the-art framework for generating protein sequence embeddings.
GoPredSim
A similarity-based approach for Gene Ontology annotation.
protein-metamorphisms-is
Serves as the reference biological information system, providing a robust data model and curated datasets for protein structural and functional analysis.

We also extend our gratitude to LifeHUB-CSIC for inspiring this initiative and fostering innovation in computational biology.

Citing FANTASIA

If you use FANTASIA in your research, please cite the following publications:

Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024).
Illuminating the functional landscape of the dark proteome across the Animal Tree of Life.
DOI: 10.1101/2024.02.28.582465
Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R., & Rojas, A. M. (2024).
Decoding proteome functional information in model organisms using protein language models.
DOI: 10.1101/2024.02.14.580341

👥 Project Team

🔧 Technical Team

Francisco Miguel Pérez Canales: fmpercan@upo.es
Author of the system’s engineering and technical implementation
Francisco J. Ruiz Mota: fraruimot@alum.us.es
Junior developer

🧬 Scientific Team & Original Authors of FANTASIA v1

Ana M. Rojas: a.rojas.m@csic.es
Gemma I. Martínez-Redondo: gemma.martinez@ibe.upf-csic.es
Rosa Fernández: rosa.fernandez@ibe.upf-csic.es

FANTASIA

Functional ANnoTAtion based on embedding space SImilArity

For full documentation, visit FANTASIA Documentation.

Key Features

✅ Available Embedding Models
Supports protein language models: ProtT5, ProstT5, and ESM2 for sequence representation.
🔍 Redundancy Filtering
Filters out homologous sequences using CD-HIT in the lookup table, allowing controlled redundancy levels through an adjustable threshold, ensuring reliable benchmarking and evaluation.
💾 Optimized Data Storage
Embeddings are stored in HDF5 format for input sequences. The reference table, however, is hosted in a public relational PostgreSQL database using pgvector.
🚀 Efficient Similarity Lookup
Performs high-speed searches using in-memory computations. Reference vectors are retrieved from a PostgreSQL database with pgvector for comparison.
🔬 Functional Annotation by Similarity
Assigns Gene Ontology (GO) terms to proteins based on embedding space similarity, using pre-trained embeddings.

Pipeline Overview (Simplified)

Embedding Generation
Computes protein embeddings using deep learning models (ProtT5, ProstT5, and ESM2).
GO Term Lookup
Performs vector similarity searches using in-memory computations to assign Gene Ontology terms. Reference embeddings are retrieved from a PostgreSQL database with pgvector. Only experimental evidence codes are used for transfer.

Acknowledgments

This version of FANTASIA builds upon previous work from:

Metazoa Phylogenomics Lab's FANTASIA
The original implementation of FANTASIA for functional annotation.
bio_embeddings
A state-of-the-art framework for generating protein sequence embeddings.
GoPredSim
A similarity-based approach for Gene Ontology annotation.
protein-metamorphisms-is
Serves as the reference biological information system, providing a robust data model and curated datasets for protein structural and functional analysis.

We also extend our gratitude to LifeHUB-CSIC for inspiring this initiative and fostering innovation in computational biology.

Citing FANTASIA

If you use FANTASIA in your research, please cite the following publications:

Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024).
Illuminating the functional landscape of the dark proteome across the Animal Tree of Life.
DOI: 10.1101/2024.02.28.582465
Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R., & Rojas, A. M. (2024).
Decoding proteome functional information in model organisms using protein language models.
DOI: 10.1101/2024.02.14.580341

👥 Project Team

🔧 Technical Team

Francisco Miguel Pérez Canales: fmpercan@upo.es
Author of the system’s engineering and technical implementation
Francisco J. Ruiz Mota: fraruimot@alum.us.es
Junior developer

🧬 Scientific Team & Original Authors of FANTASIA v1

Ana M. Rojas: a.rojas.m@csic.es
Gemma I. Martínez-Redondo: gemma.martinez@ibe.upf-csic.es
Rosa Fernández: rosa.fernandez@ibe.upf-csic.es

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

4.1.1

Feb 18, 2026

4.1.0

Jan 21, 2026

4.0.3

Oct 20, 2025

4.0.2 yanked

Oct 9, 2025

4.0.1 yanked

Sep 29, 2025

4.0.0

Oct 20, 2025

3.0.1

Sep 17, 2025

3.0.0

Jul 29, 2025

2.8.7

Jul 18, 2025

2.8.2

Jul 8, 2025

2.8.1

Jun 30, 2025

2.8.0

Jun 20, 2025

2.7.0

Jun 9, 2025

2.6.0

Jun 3, 2025

2.5.0

May 16, 2025

2.4.0

May 13, 2025

2.3.0

May 13, 2025

2.2.0

May 13, 2025

This version

2.1.0 yanked

May 7, 2025

1.8.0

May 2, 2025

1.7.0

Apr 16, 2025

1.6.0

Apr 16, 2025

1.5.0

Apr 16, 2025

1.4.0

Apr 10, 2025

1.3.0

Apr 8, 2025

1.2.0

Apr 4, 2025

1.1.0

Apr 4, 2025

1.0.0

Apr 4, 2025

0.13.3

Mar 15, 2025

0.13.2

Mar 15, 2025

0.13.1

Mar 14, 2025

0.13.0

Mar 14, 2025

0.12.0

Mar 14, 2025

0.11.0

Mar 14, 2025

0.10.0

Mar 13, 2025

0.9.0

Feb 25, 2025

0.8.0

Feb 18, 2025

0.7.0

Feb 13, 2025

0.5.0

Jan 14, 2025

0.4.0

Jan 14, 2025

0.3.0

Jan 13, 2025

0.2.0

Jan 10, 2025

0.1.0

Jan 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fantasia-2.1.0.tar.gz (28.9 kB view details)

Uploaded May 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fantasia-2.1.0-py3-none-any.whl (31.0 kB view details)

Uploaded May 7, 2025 Python 3

File details

Details for the file fantasia-2.1.0.tar.gz.

File metadata

Download URL: fantasia-2.1.0.tar.gz
Upload date: May 7, 2025
Size: 28.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.0 CPython/3.10.17 Linux/6.8.0-1027-azure

File hashes

Hashes for fantasia-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d4b50074059c206c70b6311b93bf23c1ad8e2eb04e59ab1c2608c493e302fe48`
MD5	`4b700078c8a09d1e238fc750805d7854`
BLAKE2b-256	`28f0b58dcf0e5e07f3aa63aa8e68ffa56d99af6d64e2313751fef0adf769dd8c`

See more details on using hashes here.

File details

Details for the file fantasia-2.1.0-py3-none-any.whl.

File metadata

Download URL: fantasia-2.1.0-py3-none-any.whl
Upload date: May 7, 2025
Size: 31.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.4.0 CPython/3.10.17 Linux/6.8.0-1027-azure

File hashes

Hashes for fantasia-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a9cc7addb3c246c13bb084d4d7c5f3e74f3864c58fa70619371aaeb68a2788d3`
MD5	`b90c17963bdde4bd01ab2c78e5791b45`
BLAKE2b-256	`0b48ff857899a8cac7ffd43600080231392172a6c6cb5d7ac9caa06ebd220230`

See more details on using hashes here.

fantasia 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

FANTASIA

Key Features

Pipeline Overview (Simplified)

Acknowledgments

Citing FANTASIA

👥 Project Team

🔧 Technical Team

🧬 Scientific Team & Original Authors of FANTASIA v1

FANTASIA

Key Features

Pipeline Overview (Simplified)

Acknowledgments

Citing FANTASIA

👥 Project Team

🔧 Technical Team

🧬 Scientific Team & Original Authors of FANTASIA v1

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes