Skip to main content

Fast fuzzy search over biological sequences (C++ core, Python bindings)

Project description

seqtree

CI Docs

Fast fuzzy search over biological sequences (amino-acid or nucleotide), as a C++ core with a minimal Python binding. Build an immutable index once, then search single queries or massive batches in parallel.

Two search engines over one trie:

  • seqtm — branch-and-bound enumeration. Exact per-type edit caps (max_subs / max_ins / max_dels) and a fast Hamming-only path. Best for small edit distances (UMI collapse, error correction, CDR3/epitope matching).
  • seqtrie — banded edit-distance DP. Matrix-weighted score budgets (BLOSUM62 + gap costs), cost independent of edit count. Best for similarity-scored searches.

engine="auto" picks one per query. Results are payload-agnostic: (ref_id, score, n_subs, n_ins, n_dels). Downstream libraries map ref_id back to their own payloads (V gene, MHC, counts) and filter.

Build

bash setup.sh            # repo-local .venv + editable install
bash setup.sh --tests    # + pytest
bash setup.sh --bench     # + benchmark deps (huggingface_hub, pandas, psutil)

Quickstart

import seqtree

idx = seqtree.Index.build(["CASSLAPGATNEKLFF", "CASSLELGATNEKLFF"], alphabet="aa")

p = seqtree.SearchParams(max_subs=2, engine="seqtm")
for hit in idx.search("CASSLAPGATNEKLFF", p):
    print(hit.ref_id, hit.score, hit.n_subs)

# parallel batch (releases the GIL)
results = idx.search_batch(queries, p, threads=0)   # 0 = all cores

# matrix-weighted budget
pm = seqtree.SearchParams(matrix="BLOSUM62", max_penalty=12, engine="seqtrie")
top = idx.search_top("CASSLAPGATNEKLFF", pm, k=5)

# alignment on demand
aln = idx.align(0, "CASSLELGATNEKLFF", p)
print(aln.aligned_query, aln.aligned_ref, aln.ops)

# batch-vs-batch (auto-indexes the larger set)
pairs = seqtree.pairwise_batch(query_set, db_set, p, alphabet="aa")

Tests

cmake -S . -B build -G Ninja -DSEQTREE_TESTS=ON
cmake --build build
ctest --test-dir build           # C++ unit tests
pytest tests/python              # Python tests

Benchmarks

python bench/bench.py                                   # fast tier (real VDJdb data)
env RUN_BENCHMARK=1 python bench/bench.py --sizes 1000000 --queries 1000000 --threads 16

Development

This repo follows git-flow:

  • master — stable, release-ready; CI + docs deploy run here.
  • dev — integration branch for day-to-day work.
  • feature branches branch off dev and merge back via PR; releases merge devmaster.

Roadmap (affine gaps, position-specific matrices, e-value / significance via control-set and tf-idf, succinct memory packing) lives in docs/roadmap.rst.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqtree-0.0.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seqtree-0.0.1-cp313-cp313-win_amd64.whl (194.0 kB view details)

Uploaded CPython 3.13Windows x86-64

seqtree-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (189.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

seqtree-0.0.1-cp313-cp313-macosx_11_0_arm64.whl (146.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

seqtree-0.0.1-cp312-cp312-win_amd64.whl (194.0 kB view details)

Uploaded CPython 3.12Windows x86-64

seqtree-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (188.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

seqtree-0.0.1-cp312-cp312-macosx_11_0_arm64.whl (146.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

seqtree-0.0.1-cp311-cp311-win_amd64.whl (193.5 kB view details)

Uploaded CPython 3.11Windows x86-64

seqtree-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (188.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

seqtree-0.0.1-cp311-cp311-macosx_11_0_arm64.whl (145.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

seqtree-0.0.1-cp310-cp310-win_amd64.whl (192.1 kB view details)

Uploaded CPython 3.10Windows x86-64

seqtree-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (187.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

seqtree-0.0.1-cp310-cp310-macosx_11_0_arm64.whl (143.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file seqtree-0.0.1.tar.gz.

File metadata

  • Download URL: seqtree-0.0.1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.1.tar.gz
Algorithm Hash digest
SHA256 98cbfc0c71e8b51eb55be5e40996e5607778fcf6d817064018c4238eb2ef2b7b
MD5 dce132e965dac82b86dcb9d30695c7e2
BLAKE2b-256 8b03b28c5f5ae2431fde95155dca3b78cd94926285c9bba28ecd873646642bbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1.tar.gz:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 194.0 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 95422ae51890360c7818c6057387ae12b901c2efdb1d9d2a451e15cbd6d2cce2
MD5 1d0327ea306544799f0566525f264890
BLAKE2b-256 29715aaea39d3b01cb5873906882bcf946448c7efef919c33cbb8d7c64bef864

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b7818b257faa2f900b0020315307bc86d9b4a6a0ffaa5813d564e616be5cec8e
MD5 d778cf828cbbfa8ad5d6ebcc7c8b7bb0
BLAKE2b-256 a698b59aae36708b37dcc8fc055e542e29472a35de95b29f1031a8291106f59e

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9c1f831d6aca6bff7b06dc306996a5d7bb48df40c028cc60ff7030fd8d47efa5
MD5 54049f3aa0cf37e938cfe3194a9e6b86
BLAKE2b-256 b9c2690018f8e1c488cca1174dcead32d67794becdb16833f85a820ec9d1b332

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 194.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7cee232bc3a011632290b3236cdd03f7642bfbe88fef1866f86d9996581320dc
MD5 eaa86696105d1fa3b426c71c70dc19bd
BLAKE2b-256 749d82d8c0d98aad00d0e9c75a803e969542c9a486580b2677887fe93cd945cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 336de45e0e3f57c8b7697fd170f6daf4d58f794f6771ceb3f2140eed3c38a75c
MD5 0b42ab1b3b280c218fee7e1277533cab
BLAKE2b-256 85584c7e553107064245c3bf3d004faf3477dd4b22c7554aab2a3c2d70e5706b

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 be1b0d8cc83dcbdbebf4b94dbaf33826bb0224b4c142ffdaf9091e3b5a1bf6ea
MD5 9b51425ba34a961ac826cf9d25dbbf2f
BLAKE2b-256 1ee3353aee525de800a66adb97ec3e2ad21055ffd31be0211b00caf7078b052c

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 193.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 4ac6f4d6657c05ef0f54111ae0f4f8655ec7f5d2b68fbd9017983f8b8572490b
MD5 05a308b1e8a73d3132b5bcac5e5fea41
BLAKE2b-256 5514a6e287f5a6b7448706f36178d9c6b62d060bc51364f1b971056ab387a49b

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 77c80a47731cf9d5f946c5227a905ea518b17301dddac399989bcdbc4a356b68
MD5 d18a249bf3185a60719066a6e5c59b25
BLAKE2b-256 6996448e22e53c6b97a096ca77a8feb20532a6bcc1c2adc0e04b9beef631fc71

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 97c9f5c689c8309b329139c0388bbe4a2437772a5119a58b9686cb61857900a4
MD5 cf1ca230b041cd8454df3d3d15f412fc
BLAKE2b-256 d77165fabf1c3601de2b5c9b3099911bb13158c36b640f0aaf6fa0c75886d519

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 192.1 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8873561e7501e5ee44435c6039681957a297336d7861009f84d710dcc8332b16
MD5 0f21cc516b6223913c03cf213b2aff0b
BLAKE2b-256 8dd71063157d927c95ab84b06df5e2752ad6e975f425532b0387611da31d138f

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp310-cp310-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7862dde08537b85cda37d76a2e8b0ab19aac9f375b2e211b161238d4cd0c9643
MD5 16141dac40153c3ea6ae085fa0dcf870
BLAKE2b-256 218c549231adb545378c35459e7db7ff672063e4bf35895a14d91ed53225e74b

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a6d627aa4c51c6df5999c7e48aad93c1435ac8f29b6d98ded042cc42e7ef013c
MD5 77a607787d9f9d8afeed86d0b4039d1b
BLAKE2b-256 4e0c555d3ef2e4ef7f6115284eb8bbe5802542ef9da056417fbeaa24a68b4c0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page