Skip to main content

sourmash plugin to calculate common hashes across multiple sketches.

Project description

sourmash_plugin_commonhash

If you have sketched many samples and you want to remove "rare" k-mers (present in 1, or only a few samples), this plugin is for you! This procedure helps reduce noise in Jaccard comparisons between samples.

See sourmash#2383 for an extended discussion!

Thanks to Taylor Reiter and Jessica Lumian for all their work on this!

Installation

pip install sourmash_plugin_commonhash

Usage

sourmash scripts commonhash <multiple sketches> -o commonhashes.zip

commonhash will output one filtered sketch for each input sketch. You can then use the various sourmash sig commands to union these sketches, extract individual ones, etc.

Example

sourmash scripts commonhash 

should yield:

...

Selecting k=31, DNA
Loaded 10587 hashes from 3 sketches in 3 files.
Of 10587 hashes, keeping 2529 that are in 2 or more samples.
Saved 3 signatures to 'commonhash.zip'

Support

We suggest filing issues in the main sourmash issue tracker as that receives more attention!

Dev docs

commonhash is developed at https://github.com/ctb/sourmash_plugin_commonhash.

Generating a release

Bump version number in pyproject.toml and push.

Make a new release on github.

Then pull, and:

python -m build

followed by twine upload dist/....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_commonhash-0.2.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sourmash_plugin_commonhash-0.2-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file sourmash_plugin_commonhash-0.2.tar.gz.

File metadata

  • Download URL: sourmash_plugin_commonhash-0.2.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for sourmash_plugin_commonhash-0.2.tar.gz
Algorithm Hash digest
SHA256 9bcbd47848bef26e3b8419541530ba1b41e0592ea041f955b1b79ef4189ca601
MD5 7fa4d9bda5074fb186826b06cf813442
BLAKE2b-256 1441bf0530e9edede06657ba1d03776253a02564a98bee614ecf50ffdb93eeb4

See more details on using hashes here.

File details

Details for the file sourmash_plugin_commonhash-0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for sourmash_plugin_commonhash-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f4472496a54ff128a86b66510565b026897937b99e716d77c833f6f38ad56ecc
MD5 e258cdd3b04b9e7ad3625af093adae16
BLAKE2b-256 3425d6217419be7733ee5f913264a0a91b51b652d493f7b47932298bcf464c79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page