Skip to main content

USUM: Plotting sequence similarity using USEARCH & UMAP

Project description

USUM: Plotting sequence similarity using USEARCH & UMAP

USUM uses USEARCH and UMAP to plot DNA 🧬and protein 🧶 sequence similarity embeddings.

PyPI - Downloads PyPI license PyPI version

Installation

Install UCLUST manually: https://drive5.com/usearch/download.html (consider supporting the author by buying the 64bit license)

Install usum using PIP:

pip install usum

Usage

Minimal example

usum sequences.fa --maxdist 0.2 --termdist 0.3 --output umap

Multiple input files with labels

usum first.fa second.fa --labels First Second --maxdist 0.2 --termdist 0.3 --output umap

This will produce a PNG plot:

UMAP static example

An interactive Bokeh HTML plot is also created:

UMAP Bokeh example

Programmatic use

from usum import usum

# Show help
help(usum)

# Run USUM
usum(inputs=['input.fa'], output='usum', maxdist=0.2, termdist=0.3)

How it works

  • A sparse distance matrix is calculated using USEARCH calc_distmx command.
  • The distance matrix is embedded as precomputed metric using UMAP
  • The embedding is plotted using umap.plot.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usum-0.1.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

usum-0.1.1-py3-none-any.whl (6.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page