Skip to main content

A library for comparing embedding spaces

Project description

ema-tool

ema-tool is a Python library designed to facilitate the initial comparison of diverse embedding spaces in biomedical data. By incorporating user-defined metadata on the natural grouping of data points, ema-tool enables users to compare global statistics and understand the differences in clustering of natural groupings across different embedding spaces.

Overview

Features

Given a set of samples and metadata, and at least two embedding spaces, the ema-tool provides visualisations to compare the following aspects of the embedding spaces:

  • Unsupervised Clusters: ema-tool provides a simple interface to cluster samples in the embedding space using the KMeans algorithm and compare against user-defined metadata.
  • Dimensionality Reduction: ema-tool allows users to reduce the dimensionality of the embedding space using PCA, t-SNE, or UMAP.
  • Pairwise Distances: ema-tool computes pairwise distances between samples in the embedding space. Different distance metrics are available, including Euclidean, Cosine, and Mahalanobis.

The following figure provides an overview of the ema-tool workflow:

ema-tool

Installation

You can install the ema library through pip, or access examples locally by cloning the github repo.

Installing the ema library

pip install ema-emb

Cloning the ema repo

git clone https://github/pia-francesca/ema

cd ema                         # enter project directory
pip3 install .                 # install dependencies
jupyter lab colab_notebooks    # open notebook examples in jupyter for local exploration

Colab Notebook Example

An example of how to use the ema-tool library is provided in the following colab notebook:

Open In Colab

Links to Embedding Scripts

To allow a flexible use, ema-tool does not include the scripts for generating the embeddings. However, here are some links to external scripts for generating protein embeddings from fasta files using the following models:

Contact

If you have any questions or suggestions, please feel free to reach out to the authors: francesca.risom@hpi.de.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ema_emb-0.0.5.tar.gz (12.9 MB view hashes)

Uploaded Source

Built Distribution

ema_emb-0.0.5-py3-none-any.whl (305.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page