fast command-line extensions for sourmash
Project description
sourmash_plugin_branchwater
tl;dr Do faster and lower-memory sourmash functions via this plugin.
Details
sourmash is a command-line tool and Python/Rust library for metagenome analysis and genome comparison using k-mers. While sourmash is fast and low memory, sourmash v4 and lower work in single-threaded mode with Python containers.
The branchwater plugin for sourmash (this plugin!) provides faster and lower-memory implementations of several important sourmash features - sketching, searching, and gather (metagenome decomposition). It does so by implementing higher-level functions in Rust on top of the core Rust library of sourmash. As a result it provides some of the same functionality as sourmash, but 10-100x faster and in 10x lower memory.
This code is still in prototype mode, and does not have all of the features of sourmash. As we add features we will move it back into the core sourmash code base; eventually, much of the code in this repository will be integrated into sourmash directly.
If you're intrigued but not sure where to start with this plugin, we suggest first identifying what sourmash functionality you need to run to accomplish your goals. Once you have your sourmash commands working, revisit these docs and see if there is a faster implementation available in this plugin!
This repo originated as a PyO3-based Python wrapper around the core branchwater code. Branchwater is a fast, low-memory and multithreaded application for searching very large collections of FracMinHash sketches as generated by sourmash.
For technical details, see the Rust code in src/
and Python wrapper
in src/python/
.
Documentation
There is a quickstart below, as well as more user documentation here. Nascent developer docs are also available!
The betterplot plugin supplies a number of commands that work with branchwater output. In particular,
mds2
andtsne2
generate MDS and tSNE plots frompairwise
output;clustermap1
generates seaborn clustermaps frompairwise
andmultisearch
output;clusters_to_categories
uses the output of thecluster
command to generate categories for coloring and labeling plots;pairwise_to_matrix
converts the output ofpairwise
to a sourmash comparison matrix;
See the betterplot README for example figures and commands!
Quickstart demonstrating multisearch
.
This quickstart demonstrates multisearch
using
the 64 genomes from Awad et al., 2017.
1. Install the branchwater plugin
On Linux and Mac OS X, you can install the latest release of the branchwater plugin from conda-forge:
conda install sourmash_plugin_branchwater
Please see the developer docs for information on installing the latest development version.
2. Download sketches.
The following commands will download sourmash sketches for the podar genomes into the file podar-ref.zip
:
curl -L https://osf.io/4t6cq/download -o podar-ref.zip
3. Execute!
Now run multisearch
to search all the sketches against each other:
sourmash scripts multisearch podar-ref.zip podar-ref.zip -o results.csv --cores 4
You will (hopefully ;)) see a set of results in results.csv
. These are comparisons of each query against all matching genomes.
Debugging help
If your collections aren't loading properly, try running sourmash sig summarize
on them, like so:
sourmash sig summarize podar-ref.zip
If this doesn't work, then you're running into problems creating the collection. Please ask for help on the sourmash issue tracker!
Code of Conduct
This project is under the sourmash Code of Conduct.
License
This software is under the AGPL license. Please see LICENSE.txt.
Authors
- Luiz Irber
- C. Titus Brown
- Mohamed Abuelanin
- N. Tessa Pierce-Ward
- Olga Botvinnik
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sourmash_plugin_branchwater-0.9.11.tar.gz
.
File metadata
- Download URL: sourmash_plugin_branchwater-0.9.11.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2e6ff5fa0c5a41c1e6bdefc393b953fad6002106ef0d73587e6a2ce7d0e97d9 |
|
MD5 | d59ab6a545a2a6147ebb773b2dfa1b80 |
|
BLAKE2b-256 | 4a44734eb31412ab1c20013113e0ba1d1ecb93ff51422e378c2c88f7a0cdbed5 |