USUM: Plotting sequence similarity using USEARCH & UMAP
Project description
USUM: Plotting sequence similarity using USEARCH & UMAP
USUM uses USEARCH and UMAP to plot DNA 🧬and protein 🧶 sequence similarity embeddings.
Installation
Install UCLUST
manually: https://drive5.com/usearch/download.html (consider supporting the author by buying the 64bit license)
Install usum
using PIP:
pip install usum
Usage
Minimal example
usum sequences.fa --maxdist 0.2 --termdist 0.3 --output umap
Multiple input files with labels
usum first.fa second.fa --labels First Second --maxdist 0.2 --termdist 0.3 --output umap
This will produce a PNG plot:
An interactive Bokeh HTML plot is also created:
Programmatic use
from usum import usum
# Show help
help(usum)
# Run USUM
usum(inputs=['input.fa'], output='usum', maxdist=0.2, termdist=0.3)
How it works
- A sparse distance matrix is calculated using USEARCH calc_distmx command.
- The distance matrix is embedded as
precomputed
metric using UMAP - The embedding is plotted using umap.plot.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
usum-0.1.1.tar.gz
(5.1 kB
view hashes)
Built Distribution
usum-0.1.1-py3-none-any.whl
(6.3 kB
view hashes)