Skip to main content

fast search and gather extensions for sourmash

Project description

pyo3_branchwater

PyPI

tl;dr Do fast and low-memory search/gather of many sourmash sketches via a sourmash plugin.

Details

This repo contains a PyO3-based Python wrapper around the core branchwater code. Branchwater is a fast, low-memory and multithreaded application for searching very large collections of FracMinHash sketches as generated by sourmash.

For details, see the Rust code in src/ and Python wrapper in python/.

Uses pyo3 for the Python-to-Rust wrapping.

This functionality can be used from within sourmash as a command-line plugin; see below quickstart.

Documentation

There is a quickstart below, as well as more documentation here.

Quickstart for manysearch.

To try out, you'll need to install a branch of sourmash that contains sourmash#2438.

This quickstart demonstrates manysearch using the 64 genomes from Awad et al., 2017.

First, install this code.

Install this repo in developer mode:

pip install -e .

Second, download sketches.

The following commands will download sourmash sketches for them and unpack them into the directory podar-ref/:

mkdir -p podar-ref
curl -JLO https://osf.io/4t6cq/download
unzip -u podar-reference-genomes-updated-sigs-2017.06.10.zip

Third, create lists of query and subject files.

manysearch takes in lists of signatures to search, so we need to create those files:

ls -1 podar-ref/{2,47,63}.* > query-list.txt
ls -1 podar-ref/* > podar-ref-list.txt

Fourth: Execute!

Now run manysearch:

sourmash scripts manysearch query-list.txt podar-ref-list.txt -o results.csv

You will (hopefully ;) see a set of results in results.csv.

Debugging help

If your file lists are not working properly, try running:

sourmash sig summarize query-list.txt
sourmash sig summarize podar-ref-list.txt

to make sure everything can be loaded.

Future thoughts

The speed and functions of this code will probably be brought into sourmash core in the future, most likely as part of sourmash#2230. However, in the meantime, this is a fun side project that makes use of sourmash plugins and Rust to provide some fast functionality that may be of use to some people, and it can serve as a testbed for future sourmash functionality.


CTB Feb 2023

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyo3_branchwater-0.3.0.tar.gz (2.0 MB view details)

Uploaded Source

File details

Details for the file pyo3_branchwater-0.3.0.tar.gz.

File metadata

  • Download URL: pyo3_branchwater-0.3.0.tar.gz
  • Upload date:
  • Size: 2.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.8.13

File hashes

Hashes for pyo3_branchwater-0.3.0.tar.gz
Algorithm Hash digest
SHA256 287d29dabab7adc6984432d82b71ae350ae734d7b8488438e6f80e47eb780801
MD5 8af1d9359df2623b329059a74ff52fe9
BLAKE2b-256 f6ca33d298fc378439d0463d32b05b232698f71a964e3ef1e238be397c3799ae

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page