Skip to main content

Fast fuzzy search over biological sequences (C++ core, Python bindings)

Project description

seqtree

PyPI Python License CI Docs

Fast fuzzy search over biological sequences (amino-acid or nucleotide), as a C++ core with a minimal Python binding. Build an immutable index once, then search single queries or massive batches in parallel.

Two search engines over one trie:

  • seqtm — branch-and-bound enumeration. Exact per-type edit caps (max_subs / max_ins / max_dels) and a fast Hamming-only path. Best for small edit distances (UMI collapse, error correction, CDR3/epitope matching).
  • seqtrie — banded edit-distance DP. Matrix-weighted score budgets (BLOSUM62 + gap costs), cost independent of edit count. Best for similarity-scored searches.

engine="auto" picks one per query. Results are payload-agnostic: (ref_id, score, n_subs, n_ins, n_dels). Downstream libraries map ref_id back to their own payloads (V gene, MHC, counts) and filter.

Install

pip install seqtree       # prebuilt wheels for CPython 3.10–3.13 (Linux/macOS/Windows)

Build from source

bash setup.sh            # repo-local .venv + editable install
bash setup.sh --tests    # + pytest
bash setup.sh --bench     # + benchmark deps (huggingface_hub, pandas, psutil)

Quickstart

import seqtree

idx = seqtree.Index.build(["CASSLAPGATNEKLFF", "CASSLELGATNEKLFF"], alphabet="aa")

p = seqtree.SearchParams(max_subs=2, engine="seqtm")
for hit in idx.search("CASSLAPGATNEKLFF", p):
    print(hit.ref_id, hit.score, hit.n_subs)

# parallel batch (releases the GIL)
results = idx.search_batch(queries, p, threads=0)   # 0 = all cores

# matrix-weighted budget
pm = seqtree.SearchParams(matrix="BLOSUM62", max_penalty=12, engine="seqtrie")
top = idx.search_top("CASSLAPGATNEKLFF", pm, k=5)

# alignment on demand
aln = idx.align(0, "CASSLELGATNEKLFF", p)
print(aln.aligned_query, aln.aligned_ref, aln.ops)

# batch-vs-batch (auto-indexes the larger set)
pairs = seqtree.pairwise_batch(query_set, db_set, p, alphabet="aa")

Tests

cmake -S . -B build -G Ninja -DSEQTREE_TESTS=ON
cmake --build build
ctest --test-dir build           # C++ unit tests
pytest tests/python              # Python tests

Benchmarks

python bench/bench.py                                   # recall vs ground truth (real VDJdb data)
python bench/bench_gnuplot.py                           # max-edit-3 throughput → SVG figures (needs gnuplot)
env RUN_BENCHMARK=1 python bench/bench.py --sizes 1000000 --queries 1000000 --threads 16

bench/bench_gnuplot.py renders queries/ms vs reference-set size (both engines), peak RSS, and alignment-fetch cost. See docs/benchmarks.rst.

Development

This repo follows git-flow:

  • master — stable, release-ready; CI + docs deploy run here.
  • dev — integration branch for day-to-day work.
  • feature branches branch off dev and merge back via PR; releases merge devmaster.

Roadmap (affine gaps, position-specific matrices, e-value / significance via control-set and tf-idf, succinct memory packing) lives in docs/roadmap.rst.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqtree-0.0.2.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

seqtree-0.0.2-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

seqtree-0.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

seqtree-0.0.2-cp313-cp313-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

seqtree-0.0.2-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

seqtree-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

seqtree-0.0.2-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

seqtree-0.0.2-cp311-cp311-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11Windows x86-64

seqtree-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

seqtree-0.0.2-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

seqtree-0.0.2-cp310-cp310-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.10Windows x86-64

seqtree-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

seqtree-0.0.2-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file seqtree-0.0.2.tar.gz.

File metadata

  • Download URL: seqtree-0.0.2.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.2.tar.gz
Algorithm Hash digest
SHA256 73fcddf0615dc3363d383affa8138502e0a9c206015c19641e6c342e01ae7d4f
MD5 3af79dee3580229a1bb991dcccf496ee
BLAKE2b-256 b05240dc5db6c6f89b96a83cf8f04071d7cf448b6aa7f943cd843cfbd8340fa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2.tar.gz:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.2-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 4647b4a6d82b53b6651497af30a6062fb9b1e6320c27490afeff5d877e3ce24c
MD5 5587afe64d6eb016d1de42a58f828929
BLAKE2b-256 d0d42e5966af038eafcb43c3d229c61e744353f446c8a190036e93c48e41167b

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a642c7d0e8feb55290661f358db639737a8136c07f24d233040129598adaf6b2
MD5 86d4a4ac73844f686cfaa26699d146aa
BLAKE2b-256 8b3d224c9f2a81bf8e058969e5d18594bb837efba26504e4e39b0dcec7ab52c8

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a765858ca56b11cb3e494e2a4a900695ca36e56c1d19340d1272cb91232aebda
MD5 8709b4c5ae0002c7dd6f8c60332f1330
BLAKE2b-256 e25fe9a6f26a260e1c87bbd24fd3b5bcdaa26edbf2ea83a92d4ab2ba07233d85

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.2-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 202a6ba9caf35d757056ea494e3960ad72f0b26cf5635b890f302f73b3c3a651
MD5 a17b069da98d8ee80115ddc3ef427f9d
BLAKE2b-256 b51829ce595af9994a4c63f838c3dffb677d7bb277a45a07504a98a9bed055d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3eb93f01bb6009c563f40d415b65ae9282e2dfe8666952cbb6881424cc0d1f28
MD5 905e3ff0f88e5bd8db2c4c4efec07d61
BLAKE2b-256 778c1c4b1594fc8a04158103ef7014c3a1e62b28ec24c154e8670c8099ff18ca

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 58d367e913336a2752adafcc4b8aace1e4a68d94d072197d772ecb5c1dee3871
MD5 579bb52d2587652c6041c809d47ac9c7
BLAKE2b-256 cf989f78d8460ef325367523fb55f988667aaaa473bcc439e082ccc4638a8f6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.2-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 712f092977cf5c02b400424bf1430aa5e226c7e598a53ca67c19f81d9cb49ecc
MD5 a86efd41eaff41e904274cfb387bf604
BLAKE2b-256 514cedee126d5b11ebc1d3939f208d058a1fa149f5a5d327a10c22de735da680

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8cacc3906664e21f06bfdfe703247c2a7a0478358bf5ad2418e6745258a8213f
MD5 e92a3dbef07b0adba9cd60be7dccb524
BLAKE2b-256 55ad3f74cd0f01e029322e31e24cd16e948c287d1b8924ee3827b0554fba499f

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b8757fea418555d3b2accb56d1b5572dc682d51d2f1358f3be8b0a7d6d0e52f5
MD5 d15a7de708843ce0da6b89a636bb7df7
BLAKE2b-256 5218a261d3c81b1f66f256d285e45f383ff3d8d608759e67c3f516867bb7d336

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: seqtree-0.0.2-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for seqtree-0.0.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f8b1d3bb726afb74097df8f372addcf60eac562d11bf48ae38c4f556581dfdf0
MD5 ca9f3dee2b50421ba2fc3b797a2507ec
BLAKE2b-256 ef451c454a6795315e86a3e4584c31effffc2232e9b895853bb70067913fc1d4

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp310-cp310-win_amd64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b051042bbb51128b14e2712f727c2a5bcaa0ca3c1299329c335f4a8b1a95f579
MD5 148bf795056210ee884cca6a05fdcad9
BLAKE2b-256 95e8cd638029341115f2f0beeec3516fab1578e1952826f0fb8eb1ce75156522

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file seqtree-0.0.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for seqtree-0.0.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 45ea19bdafa3a5fa29b634cd1a2be7ae98f6a97a41b76aaffcc1cc09829a5a7b
MD5 d1a12134af0efaa76a58a91f09a37375
BLAKE2b-256 be07406450046f3d3e6405117585cdb9c611665250abbc8e311c85756ebacc80

See more details on using hashes here.

Provenance

The following attestation bundles were made for seqtree-0.0.2-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: publish.yml on antigenomics/seqtree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page