Skip to main content

Exact pairwise and 3D sequence alignment utilities

Project description

SGAD

Semi-Global Alignment for Dimer calculation.

SGAD provides exact Needleman-Wunsch dynamic programming for 2-sequence and 3-sequence alignment, with symmetry-aware affine-gap scoring.

Core alignment controls in this package:

  • Four free-end flags (semiglobal behavior): seq1_left_free, seq1_right_free, seq2_left_free, seq2_right_free (and seq3 variants in 3D).
  • Position-biased / weighted scoring via score_scale_fn (2D API only).
  • Gap-close penalty via enable_gap_close_penalty (default True), which splits affine open-vs-extend delta into open + close terms. This improves reverse/swap symmetry behavior and keeps DP score consistent with rescoring. Complement score invariance additionally requires a complement-symmetric substitution matrix.

Release history: see CHANGELOG.md.

2D Needleman-Wunsch

needleman_wunsch(
    seq1,
    seq2,
    score_matrix,
    gap_open=-5,
    gap_extend=-1,
    enable_gap_close_penalty=True,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    score_scale_fn=score_scale_factor,
) -> tuple[str, str, float]

Example (Python 2D)

from sgad.pairwise import needleman_wunsch, score_scale_factor, to_ascii

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, score = needleman_wunsch(
    "GAGATATGAGGAGAGAGAGACAGAGG",
    "GAACAGAGGGAGAGACTAACCTTG",
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=True,
    seq2_left_free=True,
    seq2_right_free=False,
    score_scale_fn=score_scale_factor,
)

print(to_ascii(a1, a2, False, True, True, False))
print(score)

Output:

GAGATATGAGGAGAGAGAGACAGAGG               
                || |||||||               
                GA-ACAGAGGGAGAGACTAACCTTG
8.97420634920635

User-specified arguments

  • score_matrix: substitution model.
  • gap_open, gap_extend: affine-gap parameters.
  • enable_gap_close_penalty: toggle split open/close gap accounting.
  • Four free-end flags: semiglobal boundary behavior per sequence side.
  • score_scale_fn: per-column weighting callback.
    • Use no_score_scale_factor to disable weighting.
    • Use make_score_scaler(...) for configurable inverse-distance weighting.

Rust 2D usage

Rust wrapper API is under sgad.rust. score_scale_fn must be either None or a RustScoreScaler object from make_rust_score_scaler(...).

from sgad.pairwise import to_ascii
from sgad.rust.pairwise import make_rust_score_scaler, needleman_wunsch

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

rust_scaler = make_rust_score_scaler(decay_exponent=1.0, temperature=1.0)

a1, a2, score = needleman_wunsch(
    "GAGATATGAGGAGAGAGAGACAGAGG",
    "GAACAGAGGGAGAGACTAACCTTG",
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=True,
    seq2_left_free=True,
    seq2_right_free=False,
    enable_gap_close_penalty=True,
    score_scale_fn=rust_scaler,
)

print(to_ascii(a1, a2, False, True, True, False))
print(score)

Output:

GAGATATGAGGAGAGAGAGACAGAGG               
                || |||||||               
                GA-ACAGAGGGAGAGACTAACCTTG
8.97420634920635

Rust 2D multiprocessing caveat

RustScoreScaler objects are not picklable. In process-based parallelism (multiprocessing, joblib loky), build the scaler inside each worker (or use score_scale_fn=None). Thread-based execution avoids this serialization issue.

3D Needleman-Wunsch

Important: 3D currently does not expose score scaling (score_scale_fn).

needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix,
    gap_open=-5,
    gap_extend=-1,
    enable_gap_close_penalty=True,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    seq3_left_free=False,
    seq3_right_free=False,
) -> tuple[str, str, str, float]

Example (Python 3D)

from sgad.pairwise_3d import needleman_wunsch_3d

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, a3, score = needleman_wunsch_3d(
    "CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC",
    "CCTGCTACTCTGTTCCTTCACATC",
    "CTGTTCCCTCAATCTGATAGGTTCC",
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=True,
    seq3_left_free=True,
    seq3_right_free=False,
)

print(a1)
print(a2)
print(a3)
print(score)

Output:

CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
CCTGCTACTCTGTTCCTTCACATC-----------
---------CTGTTCCCTCA-ATCTGATAGGTTCC
108.0

User-specified arguments

  • score_matrix: substitution model (sum-of-pairs scoring in 3D).
  • gap_open, gap_extend: affine-gap parameters.
  • enable_gap_close_penalty: toggle split open/close gap accounting.
  • Six free-end flags: semiglobal boundary behavior for all sequence sides.

Rust 3D usage

from sgad.rust.pairwise_3d import needleman_wunsch_3d

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, a3, score = needleman_wunsch_3d(
    "CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC",
    "CCTGCTACTCTGTTCCTTCACATC",
    "CTGTTCCCTCAATCTGATAGGTTCC",
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=True,
    seq3_left_free=True,
    seq3_right_free=False,
    enable_gap_close_penalty=True,
)

print(a1)
print(a2)
print(a3)
print(score)

Output:

CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
CCTGCTACTCTGTTCCTTCACATC-----------
---------CTGTTCCCTCA-ATCTGATAGGTTCC
108.0

Rust 3D multiprocessing caveat

Unlike Rust 2D, there is no scaler object to serialize. For process pools, ensure worker-call arguments remain picklable and import the Rust module in the worker runtime as usual.

Benchmarking Python vs Rust backends

Based on benchmarks/time_complexity.csv, the Rust backend is consistently much faster than the Python implementation for both 2D and 3D exact DP:

  • 2D common-size comparison (n=500..1500) shows about 248x-252x speedup (for example, n=1500: Python 34.84s vs Rust 0.138s).
  • 3D common-size comparison (n=20..100) shows about 233x-282x speedup (for example, n=100: Python 55.66s vs Rust 0.198s).
  • Under the benchmark stopping rules (timeout/memory guard), Python stopped at smaller maximum sizes while Rust continued to larger sizes (2D up to n=6500, 3D up to n=260 in the recorded run).

Benchmarks were run on Ubuntu 22.04.5 LTS (Linux 6.8.0-1044-aws) on an x86_64 machine with an AMD EPYC 7R13 CPU (16 vCPUs, 8 physical cores with SMT, 32 MiB L3) and 123 GiB RAM (no swap), using uv 0.7.15, rustc 1.87.0, and cargo 1.87.0.

Interface to external dimer assessment libraries

Primer3 interface

from sgad.api.primer3 import heterodimer_batch_primer3

df = heterodimer_batch_primer3(
    primer1_seqs=["ACGTACGT"],
    primer2_seqs=["TGCATGCA"],
    primer1_names=["fwd_1"],
    primer2_names=["rev_1"],
    n_jobs=1,
)

print(df[["primer1_name", "primer2_name", "primer3_tm", "ntthal_t"]].to_string(index=False))

Output:

primer1_name primer2_name  primer3_tm  ntthal_t
       fwd_1        rev_1  -70.205833  -70.2058

IDT OligoAnalyzer interface

from sgad.api.idt import heterodimer_batch_idt

res = heterodimer_batch_idt(
    primer1_seqs=["ACGTACGT"],
    primer2_seqs=["TGCATGCA"],
    primer1_names=["fwd_1"],
    primer2_names=["rev_1"],
    client_id="invalid",
    client_secret="invalid",
    idt_username="invalid",
    idt_password="invalid",
    timeout_s=5.0,
    max_retries=1,
    raise_on_error=False,
)

print(res[0])

This example intentionally uses invalid credentials to show the failure-record shape returned when raise_on_error=False.

Output:

{'primer1_name': 'fwd_1', 'primer2_name': 'rev_1', 'primer1': 'ACGTACGT', 'primer2': 'TGCATGCA', 'ok': False, 'response': None, 'status_code': 400, 'error': '400 Client Error: Bad Request for url: https://www.idtdna.com/Identityserver/connect/token'}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgad-1.1.1.tar.gz (47.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sgad-1.1.1-cp310-abi3-win_amd64.whl (185.9 kB view details)

Uploaded CPython 3.10+Windows x86-64

sgad-1.1.1-cp310-abi3-manylinux_2_34_x86_64.whl (310.2 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.34+ x86-64

sgad-1.1.1-cp310-abi3-macosx_11_0_arm64.whl (279.8 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file sgad-1.1.1.tar.gz.

File metadata

  • Download URL: sgad-1.1.1.tar.gz
  • Upload date:
  • Size: 47.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.1.1.tar.gz
Algorithm Hash digest
SHA256 424dbffbb918855cd23dea01819751409720332201c034ea93fa5df28e30500e
MD5 980bf76d07eecbc03e57a5a742874f2c
BLAKE2b-256 fa9683da135ddade8aebe8d15838eb646b4fa38da5641ad7669c43efea3c7f38

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.1.1.tar.gz:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-1.1.1-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sgad-1.1.1-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 185.9 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.1.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ff13487a1d207b21ae486b499bb26ad9fa2c650456a310d43bc52a0e4aaee7ea
MD5 52584418b8276b285c57cb686e6bfe32
BLAKE2b-256 2ed7774c15528a39866473f413c865b569c78cc6d601d4d9e2d6003653200682

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.1.1-cp310-abi3-win_amd64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-1.1.1-cp310-abi3-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: sgad-1.1.1-cp310-abi3-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 310.2 kB
  • Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.1.1-cp310-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 25dd0d30e6c5823abf4326b7b1834f6418353099c8473e3c21b77df30c088cc6
MD5 164ec58304434483154fa262d1d392be
BLAKE2b-256 dde9f52985ad450207aa9a19707c0a130e0adc4010ed806a39101c63605e4869

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.1.1-cp310-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-1.1.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: sgad-1.1.1-cp310-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 279.8 kB
  • Tags: CPython 3.10+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.1.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5b58415f8bf6100952fee622800aeb557920b0ea970787bf7f6dfabe04f8cf84
MD5 7a5a346ba0641f432c909e239ff9847d
BLAKE2b-256 3bc417bfc235fa484dea39ef7d679abb7823a130726b5513b943e8d748079eca

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.1.1-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page