Skip to main content

No project description provided

Project description

PyPI - Python Version CI Coverage Status Codacy Badge CodeQL Ruff DOI

diverse-seq provides alignment-free algorithms to facilitate phylogenetic workflows

diverse-seq implements computationally efficient alignment-free algorithms that enable efficient prototyping for phylogenetic workflows. It can accelerate parameter selection searches for sequence alignment and phylogeny estimation by identifying a subset of sequences that are representative of the diversity in a collection. We show that selecting representative sequences with an entropy measure of k-mer frequencies correspond well to sampling via conventional genetic distances. The computational performance is linear with respect to the number of sequences and can be run in parallel. Applied to a collection of 10.5k whole microbial genomes on a laptop took ~12 minutes to prepare the data and ~2 minutes to select 100 representatives. diverse-seq can further boost the performance of phylogenetic estimation by providing a seed phylogeny that can be further refined by a more sophisticated algorithm. For ~1k whole microbial genomes on a laptop, it takes ~1.8 minutes to estimate a bifurcating tree from mash distances.

You can read more about the methods implemented in diverse-seq in the paper here.

The user documentation is here.

📣 Announcements 📣

Reimplemented core routines in Rust!

The prep step takes approximately the same amount of time. Sampling divergent sequences is ~2x faster 🏎️🎉.

Warning -- backwards incompatible changes

The Rust rewrite was accompanied by a switch to using the Zarr storage format instead of HDF5. The output file from dvs prep now has the suffix .dvseqsz instead of .dvseq. Old-format files are not compatible with this version.

Installation

We recommend installing diverse-seq from PyPI as follows

pip install "diverse-seq[extra]"

for the full jupyter experience.

For command line only usage, install as follows

pip install diverse-seq

NOTE If you experience any errors during installation, we recommend using uv pip. This command provides much better error messages than the standard pip command. If you cannot resolve the installation problem, please open an issue on the GitHub repository.

Using uv

Speaking of uv, it provides a simplified approach to install dvs as a command-line only tool as

uv tool install diverse-seq

Usage in this case is then

uvx --from diverse-seq dvs

Dependencies

For a full listing of dependencies, see the pyproject.toml file.

The command line interface

dvs is the command line interface for diverse-seq.

The `dvs` subcommands
Usage: dvs [OPTIONS] COMMAND [ARGS]...

  dvs -- alignment free detection of the most diverse sequences using JSD

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  demo-data  Export a demo sequence file
  prep       Writes processed sequences to <Zarr Storage>.dvseqsz.
  max        Identify the seqs that maximise average delta JSD
  nmost      Identify n seqs that maximise average delta JSD
  ctree      Quickly compute a cluster tree based on kmers for a collection...

The Python API

We make comparable capabilities available as cogent3 apps. The main difference is the app instances directly operate on, and return, cogent3 sequence collections. See the docs for demonstrations of how to use the apps.

Project Information

diverse-seq is released under the BSD-3 license. If you want to contribute to the diverse-seq project (and we hope you do! 😇) the code of conduct and other useful developer information is available on the wiki.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diverse_seq-2026.4.20.tar.gz (571.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

diverse_seq-2026.4.20-cp314-cp314-win_amd64.whl (2.6 MB view details)

Uploaded CPython 3.14Windows x86-64

diverse_seq-2026.4.20-cp314-cp314-manylinux_2_34_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.34+ x86-64

diverse_seq-2026.4.20-cp314-cp314-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

diverse_seq-2026.4.20-cp313-cp313-win_amd64.whl (2.6 MB view details)

Uploaded CPython 3.13Windows x86-64

diverse_seq-2026.4.20-cp313-cp313-manylinux_2_34_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

diverse_seq-2026.4.20-cp313-cp313-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

diverse_seq-2026.4.20-cp312-cp312-win_amd64.whl (2.6 MB view details)

Uploaded CPython 3.12Windows x86-64

diverse_seq-2026.4.20-cp312-cp312-manylinux_2_34_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

diverse_seq-2026.4.20-cp312-cp312-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

diverse_seq-2026.4.20-cp311-cp311-win_amd64.whl (2.6 MB view details)

Uploaded CPython 3.11Windows x86-64

diverse_seq-2026.4.20-cp311-cp311-manylinux_2_34_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

diverse_seq-2026.4.20-cp311-cp311-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file diverse_seq-2026.4.20.tar.gz.

File metadata

  • Download URL: diverse_seq-2026.4.20.tar.gz
  • Upload date:
  • Size: 571.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diverse_seq-2026.4.20.tar.gz
Algorithm Hash digest
SHA256 acb4d1d4f85ab586c28b58a872f2d44c8a9dde341c35c8cd581370f7bcfac2a1
MD5 5e66a59dfdde1e604516e22e9100874f
BLAKE2b-256 04f3d813691591d641b27913e2bfba31e0e70dab4ca0293fce9898b10824eb0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20.tar.gz:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp314-cp314-win_amd64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp314-cp314-win_amd64.whl
Algorithm Hash digest
SHA256 711b400c5aa958bb702d2e10d42358a7d7c79d8f18690d55a798d4eee16252e5
MD5 88bbda606f399994dad0a857016553d5
BLAKE2b-256 a3b2d0cfe02f1f9e297416eb289ecc0e47c646174d705b27df0de9ed74e9eeef

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp314-cp314-win_amd64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp314-cp314-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp314-cp314-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 62bffd6eb849c62c9b773f93631f06f0b752bcf5c31dea2649a1c93b312a0d8f
MD5 ee8763ea272b9cf1678b94269c89cd21
BLAKE2b-256 3f57e9ee14025dea7893a2a3a1bc632bbd42130eb00f4947e5f5384316fc07ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp314-cp314-manylinux_2_34_x86_64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 edf059a796dde96fb4ab95abb4c9e753d5ea8305b3c151849da02abba6fe9232
MD5 dfea2d0964e19ca2e4a5b65490b24670
BLAKE2b-256 258b1f0f9032a17f81463c6fc9b8324659a4a211b5947d107b4cc468745fab33

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp314-cp314-macosx_11_0_arm64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2e114fdefbb5627e76bb14f158b842238e2348dc33d96d964d1cd647cf8e39c1
MD5 da829738721ec7d98c3d4527e156a7f8
BLAKE2b-256 3f9014606f6b6c3cbcef0d202b2bf5b82673ffaed4290e0f65ccc7843f11f3b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp313-cp313-win_amd64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 dec84e1f1396634bf483e39c2f6327a2af006a3ed3bb3dc8ecc00d383aaad612
MD5 2e014feb150670daef47b2add0361645
BLAKE2b-256 7e737ac303f031dd0432aefea878f7cccb7fe0c3f764969e02f2cd40a21be071

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e939a3954717b3953e46e96642d65aec11366ed2f6fc6506ebb9933321d534ac
MD5 115f465711e93939fd88316b6c09fde6
BLAKE2b-256 668d288c4ea60faed0aee9b4030925294f94b6486d82cb1cd7835526f5550020

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0663d386fcf749f707e4651977a19a02da37755235a0f872f2da76da651d678d
MD5 fee5cbbc4de16eec61f6355373753255
BLAKE2b-256 439606d8f613bcc260c1306744f4eb11a5d7cfad528e66a5d637abb64a99e30b

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp312-cp312-win_amd64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 37788834390a81b8395ba2081f0e59dd60563c1c0a8d21bc51f5ad62d27655de
MD5 8461278afc5d8d34dfb3a63b2a4da889
BLAKE2b-256 854a110bc0e71b3e7059a911632a04dd59eb8365a2e1e5af6f0244d7fe25c367

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 18f53826b5d4b8d9cc22ef2950e478d49d11ac9ae25154f84ee7ee22875b45a9
MD5 8f7852b37815bb7042ba41890e354126
BLAKE2b-256 3fb0853f407438a478a9b085ec5c895e5751f208c679d449d74a1cd9152589e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8d8bbe5d713d7f5479836ae081e51c6cde94de36a33a1fda2e54bf96a84a3e58
MD5 83730e9a1b97b1c88caa84a2b13bd1a3
BLAKE2b-256 dbb4de4cf662f85ead83096d42d7fef3c069db73408a27b865855e23a28b802a

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp311-cp311-win_amd64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e3da5fa9faf5e2c436071560343de77dc6a3d3004b800bcc2776f02523eb4a83
MD5 ba830462779633a1f8819a1da22ebe8d
BLAKE2b-256 840818f28959bc2a7e30bcde43659e8ede0fad0c543a424819b6968a8236c101

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diverse_seq-2026.4.20-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for diverse_seq-2026.4.20-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a707e91f5ea3db1f5a0af192d3e730f2b5ec0e61d05b57839bbd38723340a89
MD5 bf712b38a5414b7f5628c6a4cf8ed930
BLAKE2b-256 d3181606d312ead0fac5f4e15696097653a75411098df1c20b617613d386d375

See more details on using hashes here.

Provenance

The following attestation bundles were made for diverse_seq-2026.4.20-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on HuttleyLab/DiverseSeq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page