Skip to main content

Functional ANnoTAtion based on embedding space SImilArity

Project description


FANTASIA

FANTASIA Logo

FANTASIA (Functional ANnoTAtion based on embedding space SImilArity) is a pipeline for annotating Gene Ontology (GO) terms for protein sequences using advanced protein language models like ProtT5, ProstT5, and ESM2. This system automates complex workflows, from sequence processing to functional annotation, providing a scalable and efficient solution for protein structure and functionality analysis.


Key Features

  • Redundancy Filtering: Removes identical sequences with CD-HIT and optionally excludes sequences based on length constraints.
  • Embedding Generation: Utilizes state-of-the-art models for protein sequence embeddings.
  • GO Term Lookup: Matches embeddings with a vector database to retrieve associated GO terms.
  • Results: Outputs annotations in timestamped CSV files for reproducibility.

Installation

To install FANTASIA, ensure you have Python 3.8+ installed and use the following commands:

pip install fantasia

Quick Start

Prerequisites

Ensure the Information System is properly configured before running FANTASIA. Detailed instructions are available in the project documentation.

Running the Pipeline

Execute the following command, specifying the path to the configuration file:

python main.py --config <path_to_config.yaml>

Pipeline Overview

  1. Redundancy Filtering: Removes identical sequences and optionally filters sequences based on length.
  2. Embedding Generation: Computes embeddings for sequences using supported models and stores them in HDF5 format.
  3. GO Term Lookup: Queries a vector database to find and annotate similar proteins.
  4. Output: Saves annotations in a structured CSV file.

Documentation

For complete details on pipeline configuration, parameters, and deployment, visit the FANTASIA Documentation.


Citation

If you use FANTASIA in your work, please cite the following:

  1. Martínez-Redondo, G. I., Barrios, I., Vázquez-Valls, M., Rojas, A. M., & Fernández, R. (2024). Illuminating the functional landscape of the dark proteome across the Animal Tree of Life.
    https://doi.org/10.1101/2024.02.28.582465.

  2. Barrios-Núñez, I., Martínez-Redondo, G. I., Medina-Burgos, P., Cases, I., Fernández, R. & Rojas, A.M. (2024). Decoding proteome functional information in model organisms using protein language models.
    https://doi.org/10.1101/2024.02.14.580341.


Contact Information

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fantasia-0.2.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fantasia-0.2.0-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file fantasia-0.2.0.tar.gz.

File metadata

  • Download URL: fantasia-0.2.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for fantasia-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7b0046cf7255d9c8313e9e4c7673cb3db86657eb3baa238bb5b059efa2f39aa0
MD5 d18d37430354768929ec0f94c64af8f7
BLAKE2b-256 a5dc5af58faa223a09b0a111698285f1a4e4c051b4f9f0f43de77f4318a7befa

See more details on using hashes here.

File details

Details for the file fantasia-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: fantasia-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.4.0 CPython/3.10.16 Linux/6.5.0-1025-azure

File hashes

Hashes for fantasia-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddfe1331068c435885d4420d14f986d6cfed9bfd558170ae5ee10a82cd163456
MD5 7d6effe9c75b19cab8631d714cfabb23
BLAKE2b-256 8e7b73006bd587dcfe905f5728b6a5b830059c61d447accbe45ccc6dae6e3791

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page