Skip to main content

Exact pairwise and 3D sequence alignment utilities

Project description

SGAD

Semi-Global Alignment for Dimer calculation

This repository currently exposes two alignment entry points:

  • needleman_wunsch in src/sgad/pairwise.py for 2-sequence alignment.
  • needleman_wunsch_3d in src/sgad/pairwise_3d.py for exact 3-sequence alignment.

Both are global-style dynamic programming aligners with optional free ends (semiglobal behavior when enabled).

Contents

Pairwise API: needleman_wunsch

Signature (simplified):

needleman_wunsch(
    seq1,
    seq2,
    score_matrix,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    score_scale_fn=score_scale_factor,
) -> tuple[str, str, float]

Example (dimer structure prediction)

from sgad import needleman_wunsch, score_scale_factor, to_ascii

primer1 = "GAGATATGAGGAGAGAGAGACAGAGG"  # right free only
primer2_rc = "GAACAGAGGGAGAGACTAACCTTG"  # left free only

seq1_left_free, seq1_right_free = False, True
seq2_left_free, seq2_right_free = True, False

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, score = needleman_wunsch(
    primer1,
    primer2_rc,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=seq1_left_free,
    seq1_right_free=seq1_right_free,
    seq2_left_free=seq2_left_free,
    seq2_right_free=seq2_right_free,
    score_scale_fn=score_scale_factor,
)

print(to_ascii(a1, a2))
print(score)
# Output:
# GAGATATGAGGAGAGAGAGACAGAGG---------------
#                 || |||||||               
# ----------------GA-ACAGAGGGAGAGACTAACCTTG
# 8.974206349206348

User-specified values

  • score_matrix: your substitution model.
  • gap_open and gap_extend: affine gap penalties.
  • Free-end flags: whether leading/trailing gaps are free per sequence.
  • score_scale_fn:
    • Preimplemented options:
      • score_scale_factor: default behavior in this module.
      • no_score_scale_factor: disable scaling (always returns 1.0).
    • User-defined option:
      • You can pass your own callable with the same signature as score_scale_factor.
      • The function must accept four indices plus the four *_free flags and return a float scale.

Example custom function:

def my_score_scale_fn(
        seq1_left_idx: int,
        seq1_right_idx: int,
        seq2_left_idx: int,
        seq2_right_idx: int,
        seq1_left_free: bool,
        seq1_right_free: bool,
        seq2_left_free: bool,
        seq2_right_free: bool,
) -> float:
        # Replace with your own domain-specific scaling logic.
        return 1.0


a1, a2, score = needleman_wunsch(
        primer1,
        primer2_rc,
        score_matrix=mat,
        gap_open=-5,
        gap_extend=-1,
        seq1_left_free=seq1_left_free,
        seq1_right_free=seq1_right_free,
        seq2_left_free=seq2_left_free,
        seq2_right_free=seq2_right_free,
        score_scale_fn=my_score_scale_fn,
)

Pairwise features

Supported:

  • Exact DP optimum for two sequences.
  • Affine gap penalties.
  • Per-sequence free left/right ends.
  • Deterministic tie-breaking.
  • Pluggable score scaling callback (score_scale_fn).

Not supported yet:

  • Local alignment (Smith-Waterman).
  • Automatic reverse-complement generation.
  • Built-in ambiguous alphabet handling (for example N) unless you include it in score_matrix.
  • Banding/heuristics for very long sequences.

3D API: needleman_wunsch_3d

Signature (simplified):

needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    seq3_left_free=False,
    seq3_right_free=False,
) -> tuple[str, str, str, float]

Example (dimer + two primers)

from sgad import needleman_wunsch_3d

dimer = "CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC"  # anchored
primer1 = "CCTGCTACTCTGTTCCTTCACATC"  # right free only
primer2_rc = "CTGTTCCCTCAATCTGATAGGTTCC"  # left free only

seq1_left_free = seq1_right_free = False
seq2_left_free, seq2_right_free = False, True
seq3_left_free, seq3_right_free = True, False

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, a3, score = needleman_wunsch_3d(
    dimer,
    primer1,
    primer2_rc,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=seq1_left_free,
    seq1_right_free=seq1_right_free,
    seq2_left_free=seq2_left_free,
    seq2_right_free=seq2_right_free,
    seq3_left_free=seq3_left_free,
    seq3_right_free=seq3_right_free,
)

print(a1)
print(a2)
print(a3)
print(score)
# Output:
# CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
# CCTGCTACTCTGTTCCTTCACATC-----------
# ---------CTGTTCCCTCA-ATCTGATAGGTTCC
# 108.0

User-specified values

  • score_matrix for all letters you expect.
  • Affine gap penalties.
  • Left/right free-end settings for each of the 3 sequences.

3D features

Supported:

  • Exact 3-sequence DP optimum (no heuristic shortcuts).
  • Sum-of-pairs substitution scoring.
  • Affine gaps per sequence.
  • Per-sequence free terminal gaps (left/right).
  • Deterministic tie-breaking.

Not supported yet:

  • Pluggable score_scale_fn (3D currently has no scaling callback parameter).
  • Local alignment mode.
  • Banded or heuristic memory/time reduction for long inputs.
  • Automatic sequence preprocessing (reverse-complementing, case-specific cleanup).

Rust Backend

You can use Rust-accelerated implementations for both needleman_wunsch (2D) and needleman_wunsch_3d (3D).

Rust 2D usage

from sgad.rust.pairwise import needleman_wunsch

a1, a2, score = needleman_wunsch(
    seq1,
    seq2,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
)

Rust 2D score scaling options:

  • No scaling by default (score_scale_fn=None).
  • No scaling (score_scale_fn=no_score_scale_factor).
  • Native Rust scaler objects created by make_rust_score_scaler.

Arbitrary Python scaling callables are not supported in the Rust 2D backend.

from sgad.rust.pairwise import make_rust_score_scaler, needleman_wunsch

rust_scaler = make_rust_score_scaler(decay_exponent=1.3, temperature=0.9)

a1, a2, score = needleman_wunsch(
    seq1,
    seq2,
    score_matrix=mat,
    score_scale_fn=rust_scaler,
)

Equivalent direct-native 2D usage:

from sgad.rust.sgad_rust_native import make_rust_score_scaler, needleman_wunsch

rust_scaler = make_rust_score_scaler(decay_exponent=1.3, temperature=0.9)

a1, a2, score = needleman_wunsch(
    seq1,
    seq2,
    score_matrix=mat,
    score_scaler_fn=rust_scaler,
)

Rust 3D usage

from sgad.rust.pairwise_3d import needleman_wunsch_3d

a1, a2, a3, score = needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    seq3_left_free=False,
    seq3_right_free=False,
)

Equivalent direct-native 3D usage:

from sgad.rust.sgad_rust_native import needleman_wunsch_3d

a1, a2, a3, score = needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix=mat,
)

Benchmark Results

Based on benchmarks/time_complexity.csv, the Rust backend is consistently much faster than the Python implementation for both 2D and 3D exact DP:

  • 2D common-size comparison (n=500..1500) shows about 248x-252x speedup (for example, n=1500: Python 34.84s vs Rust 0.138s).
  • 3D common-size comparison (n=20..100) shows about 233x-282x speedup (for example, n=100: Python 55.66s vs Rust 0.198s).
  • Under the benchmark stopping rules (timeout/memory guard), Python stopped at smaller maximum sizes while Rust continued to larger sizes (2D up to n=6500, 3D up to n=260 in the recorded run).

Benchmarks were run on Ubuntu 22.04.5 LTS (Linux 6.8.0-1044-aws) on an x86_64 machine with an AMD EPYC 7R13 CPU (16 vCPUs, 8 physical cores with SMT, 32 MiB L3) and 123 GiB RAM (no swap), using uv 0.7.15, rustc 1.87.0, and cargo 1.87.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgad-0.1.0.tar.gz (20.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sgad-0.1.0-cp310-abi3-win_amd64.whl (153.9 kB view details)

Uploaded CPython 3.10+Windows x86-64

sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl (277.9 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.34+ x86-64

sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl (247.6 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file sgad-0.1.0.tar.gz.

File metadata

  • Download URL: sgad-0.1.0.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f067cd0707ea7ad8f3e922354a388b03683edfd97b46085f4434493c9242b4be
MD5 67b1e8e69ca28552edb5976f1270622a
BLAKE2b-256 1995cdff184f2a61ae3e75054f832b1262ca0f202df64d0e8034501481a66d6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-0.1.0.tar.gz:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-0.1.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sgad-0.1.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 153.9 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-0.1.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8177f7051abdade5d5450282b2ce613a20355323111449f9489d9cf4b1cc54f2
MD5 ddf36d8697d6f56eb10ff2ce7aac56c2
BLAKE2b-256 c10349bc6f0a406ff68648ead426e7eafc246db53e96354477d870cefe7e1d3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-0.1.0-cp310-abi3-win_amd64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 277.9 kB
  • Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 b7be2d650645f528c8d09620edb61dcec2b3ce2cfb6e93e253afc951d0300f72
MD5 8372ca218e1eb4f4da9fe43a7dd92810
BLAKE2b-256 28872bf17e7fef4745aa50bc14904937059ac2c13536240d1afec20256535768

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 247.6 kB
  • Tags: CPython 3.10+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 647115cdf743416fe3f5542f8a961739e0c389f109bcccd7d33588f4ba30eabb
MD5 1ca3242a96a778639eb3a1e1d2c3d7a2
BLAKE2b-256 f8283d4a89659e5e7ca754a1ab570141cc6e83da04005ac0ec0184ce94056d3b

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page