A library for comparing embedding spaces
Project description
ema-tool
ema-tool is a Python library designed to facilitate the initial comparison of diverse embedding spaces in biomedical data. By incorporating user-defined metadata on the natural grouping of data points, ema-tool enables users to compare global statistics and understand the differences in clustering of natural groupings across different embedding spaces.
Overview
Features
Given a set of samples and metadata, and at least two embedding spaces, the ema-tool provides visualisations to compare the following aspects of the embedding spaces:
- Unsupervised Clusters: ema-tool provides a simple interface to cluster samples in the embedding space using the KMeans algorithm and compare against user-defined metadata.
- Dimensionality Reduction: ema-tool allows users to reduce the dimensionality of the embedding space using PCA, t-SNE, or UMAP.
- Pairwise Distances: ema-tool computes pairwise distances between samples in the embedding space. Different distance metrics are available, including Euclidean, Cosine, and Mahalanobis.
The following figure provides an overview of the ema-tool workflow:
Installation
You can install the ema library through pip, or access examples locally by cloning the github repo.
Installing the ema library
pip install ema-emb
Cloning the ema repo
git clone https://github/pia-francesca/ema
cd ema # enter project directory
pip3 install . # install dependencies
jupyter lab colab_notebooks # open notebook examples in jupyter for local exploration
Colab Notebook Example
An example of how to use the ema-tool library is provided in the following colab notebook:
Links to Embedding Scripts
To allow a flexible use, ema-tool does not include the scripts for generating the embeddings. However, here are some links to external scripts for generating protein embeddings from fasta files using the following models:
Contact
If you have any questions or suggestions, please feel free to reach out to the authors: francesca.risom@hpi.de.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ema_emb-0.0.5.tar.gz
.
File metadata
- Download URL: ema_emb-0.0.5.tar.gz
- Upload date:
- Size: 12.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 315679b6246cf78bffbfa792cde4801889c924be568f16ca934a044d8d02b950 |
|
MD5 | 34a108328660f25e5cbbeaf6fd12bbcd |
|
BLAKE2b-256 | d29ac9a8d7df7abb020db8621748e1ed135c7c59ba62aad03397be57bba529f0 |
File details
Details for the file ema_emb-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: ema_emb-0.0.5-py3-none-any.whl
- Upload date:
- Size: 305.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d62e243c43cec46d3c784a1992b4e0553a2b1f3260cf0fb57c2fd2d538af96f |
|
MD5 | 305facdfbe2744253e8757e0e883877b |
|
BLAKE2b-256 | dcd77b3ede074e5107cdd298a6cea7d4b60c8549c4729ba3fc61d105e7bf6086 |