Skip to main content

fast command-line extensions for sourmash

Project description

sourmash_plugin_branchwater

PyPI

tl;dr Do faster and lower-memory sourmash search & gather via a plugin.

Details

sourmash is a command-line tool and Python/Rust library for metagenome analysis and genome comparison using k-mers. While sourmash is fast and low memory, sourmash v4 and lower work in single-threaded mode with Python containers.

The branchwater plugin for sourmash (this plugin!) provides faster and lower-memory implementations of several important sourmash features - sketching, searching, and gather (metagenome decomposition). It does so by implementing higher-level functions in Rust on top of the core Rust library of sourmash. As a result it provides some of the same functionality as sourmash, but 10-100x faster and in 10x lower memory.

This code is still in prototype mode, and does not have all of the features of sourmash. As we add features we will move it back into the core sourmash code base; eventually, much of the code in this repository will be integrated into sourmash directly.

This repo originated as a PyO3-based Python wrapper around the core branchwater code. Branchwater is a fast, low-memory and multithreaded application for searching very large collections of FracMinHash sketches as generated by sourmash.

For details, see the Rust code in src/ and Python wrapper in src/python/.

Documentation

There is a quickstart below, as well as more user documentation here. Nascent developer docs are also available!

Quickstart for manysearch.

This quickstart demonstrates multisearch using the 64 genomes from Awad et al., 2017.

1. Install the branchwater plugin

On Linux, you can install the branchwater plugin from conda-forge:

conda install sourmash_plugin_branchwater

On other platforms (such as Mac OS X) you'll need to install the branchwater plugin in a development environment; please see the developer docs for information.

2. Download sketches.

The following commands will download sourmash sketches for the podar genomes into the file podar-ref.zip:

curl -L https://osf.io/4t6cq/download -o podar-ref.zip

3. Execute!

Now run multisearch to search all the sketches against each other:

sourmash scripts multisearch podar-ref.zip podar-ref.zip -o results.csv --cores 4

You will (hopefully ;)) see a set of results in results.csv. These are comparisons of each query against all matching genomes.

Debugging help

If your collections aren't loading properly, try running sourmash sig summarize on them, like so:

sourmash sig summarize podar-ref.zip

This will make sure everything can be loaded properly.

Code of Conduct

This project is under the sourmash Code of Conduct.

License

This software is under the AGPL license. Please see LICENSE.txt.

Authors

  • Luiz Irber
  • C. Titus Brown
  • Mohamed Abuelanin
  • N. Tessa Pierce-Ward

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_branchwater-0.8.5.tar.gz (22.1 MB view details)

Uploaded Source

File details

Details for the file sourmash_plugin_branchwater-0.8.5.tar.gz.

File metadata

File hashes

Hashes for sourmash_plugin_branchwater-0.8.5.tar.gz
Algorithm Hash digest
SHA256 e36f70c1e8bb3167dbc1d96ca075258572a6ab6e89eb43e5de9e9464f5641c31
MD5 acff94f8c8337665247d045026e84a2b
BLAKE2b-256 98f47181e6af359e4300ac77153cc03b1438ce90bb82659871e533b12fdd0806

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page