Skip to main content

Functional ANnoTAtion based on embedding space SImilArity

Project description


FANTASIA

FANTASIA Logo

FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) is a pipeline for annotating Gene Ontology (GO) terms for protein sequences using advanced protein language models like ProtT5, ProstT5, and ESM2. This system automates complex workflows, from sequence processing to functional annotation, providing a scalable and efficient solution for protein structure and functionality analysis.


Key Features

  • Redundancy Filtering: Removes identical sequences with CD-HIT and optionally excludes sequences based on length constraints.
  • Embedding Generation: Utilizes state-of-the-art models for protein sequence embeddings.
  • GO Term Lookup: Matches embeddings with a vector database to retrieve associated GO terms.
  • Results: Outputs annotations in timestamped CSV files for reproducibility.

Installation

To install FANTASIA, ensure you have Python 3.8+ installed and use the following commands:

pip install fantasia

Quick Start

Prerequisites

Ensure the Information System is properly configured before running FANTASIA. Detailed instructions are available in the project documentation.

Running the Pipeline

Execute the following command, specifying the path to the configuration file:

python main.py --config <path_to_config.yaml>

Pipeline Overview

  1. Redundancy Filtering: Removes identical sequences and optionally filters sequences based on length.
  2. Embedding Generation: Computes embeddings for sequences using supported models and stores them in HDF5 format.
  3. GO Term Lookup: Queries a vector database to find and annotate similar proteins.
  4. Output: Saves annotations in a structured CSV file.

Documentation

For complete details on pipeline configuration, parameters, and deployment, visit the FANTASIA Documentation.


Citation

If you use FANTASIA in your work, please cite the following:

  1. Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024). Illuminating the functional landscape of the dark proteome across the Animal Tree of Life.
    https://doi.org/10.1101/2024.02.28.582465.

  2. Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R. & Rojas, A.M. (2024). Decoding proteome functional information in model organisms using protein language models.
    https://doi.org/10.1101/2024.02.14.580341.


Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fantasia-0.1.0.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fantasia-0.1.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file fantasia-0.1.0.tar.gz.

File metadata

  • Download URL: fantasia-0.1.0.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for fantasia-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b2d4999163e0b570b4440dad6e2fa9b989e5f0559fd7d35b9d94257bee8906c2
MD5 c61777b970e78e1d64e4cabd22ec00d4
BLAKE2b-256 d45c684553ec723255b61934e025f403d7f27559c962328f6e5bd86e4c9fa3f9

See more details on using hashes here.

File details

Details for the file fantasia-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fantasia-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for fantasia-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b91c6daba6147be371c4bdd001a4616015d70991e0f01b8430d75b939038d120
MD5 e764e6968a1194a94ace74b7942871fa
BLAKE2b-256 c7cfc3d452c39e57211882e90b03b7df6f13aeb1e908b2927b70ea902aecaf29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page