Skip to main content

A library for comparing embedding spaces

Project description

ema-tool

ema-tool is a Python library designed to facilitate the initial comparison of diverse embedding spaces in biomedical data. By incorporating user-defined metadata on the natural grouping of data points, ema-tool enables users to compare global statistics and understand the differences in clustering of natural groupings across different embedding spaces.

Overview

Features

Given a set of samples and metadata, and at least two embedding spaces, the ema-tool provides visualisations to compare the following aspects of the embedding spaces:

  • Unsupervised Clusters: ema-tool provides a simple interface to cluster samples in the embedding space using the KMeans algorithm and compare against user-defined metadata.
  • Dimensionality Reduction: ema-tool allows users to reduce the dimensionality of the embedding space using PCA, t-SNE, or UMAP.
  • Pairwise Distances: ema-tool computes pairwise distances between samples in the embedding space. Different distance metrics are available, including Euclidean, Cosine, and Mahalanobis.

The following figure provides an overview of the ema-tool workflow:

ema-tool

Installation

You can install the ema library through pip, or access examples locally by cloning the github repo.

Installing the ema library

pip install ema-emb

Cloning the ema repo

git clone https://github/pia-francesca/ema

cd ema                         # enter project directory
pip3 install .                 # install dependencies
jupyter lab colab_notebooks    # open notebook examples in jupyter for local exploration

Colab Notebook Example

An example of how to use the ema-tool library is provided in the following colab notebook:

Open In Colab

Links to Embedding Scripts

To allow a flexible use, ema-tool does not include the scripts for generating the embeddings. However, here are some links to external scripts for generating protein embeddings from fasta files using the following models:

Contact

If you have any questions or suggestions, please feel free to reach out to the authors: francesca.risom@hpi.de.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ema_emb-0.0.5.tar.gz (12.9 MB view details)

Uploaded Source

Built Distribution

ema_emb-0.0.5-py3-none-any.whl (305.3 kB view details)

Uploaded Python 3

File details

Details for the file ema_emb-0.0.5.tar.gz.

File metadata

  • Download URL: ema_emb-0.0.5.tar.gz
  • Upload date:
  • Size: 12.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for ema_emb-0.0.5.tar.gz
Algorithm Hash digest
SHA256 315679b6246cf78bffbfa792cde4801889c924be568f16ca934a044d8d02b950
MD5 34a108328660f25e5cbbeaf6fd12bbcd
BLAKE2b-256 d29ac9a8d7df7abb020db8621748e1ed135c7c59ba62aad03397be57bba529f0

See more details on using hashes here.

File details

Details for the file ema_emb-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: ema_emb-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 305.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for ema_emb-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3d62e243c43cec46d3c784a1992b4e0553a2b1f3260cf0fb57c2fd2d538af96f
MD5 305facdfbe2744253e8757e0e883877b
BLAKE2b-256 dcd77b3ede074e5107cdd298a6cea7d4b60c8549c4729ba3fc61d105e7bf6086

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page