Skip to main content

Exact pairwise and 3D sequence alignment utilities

Project description

SGAD

Semi-Global Alignment for Dimer calculation

This repository currently exposes two alignment entry points:

  • needleman_wunsch in src/sgad/pairwise.py for 2-sequence alignment.
  • needleman_wunsch_3d in src/sgad/pairwise_3d.py for exact 3-sequence alignment.

Both are global-style dynamic programming aligners with optional free ends (semiglobal behavior when enabled).

Contents

Pairwise API: needleman_wunsch

Signature (simplified):

needleman_wunsch(
    seq1,
    seq2,
    score_matrix,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    score_scale_fn=score_scale_factor,
) -> tuple[str, str, float]

Example (dimer structure prediction)

from sgad import needleman_wunsch, score_scale_factor, to_ascii

primer1 = "GAGATATGAGGAGAGAGAGACAGAGG"  # right free only
primer2_rc = "GAACAGAGGGAGAGACTAACCTTG"  # left free only

seq1_left_free, seq1_right_free = False, True
seq2_left_free, seq2_right_free = True, False

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, score = needleman_wunsch(
    primer1,
    primer2_rc,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=seq1_left_free,
    seq1_right_free=seq1_right_free,
    seq2_left_free=seq2_left_free,
    seq2_right_free=seq2_right_free,
    score_scale_fn=score_scale_factor,
)

print(to_ascii(a1, a2))
print(score)
# Output:
# GAGATATGAGGAGAGAGAGACAGAGG---------------
#                 || |||||||               
# ----------------GA-ACAGAGGGAGAGACTAACCTTG
# 8.974206349206348

User-specified values

  • score_matrix: your substitution model.
  • gap_open and gap_extend: affine gap penalties.
  • Free-end flags: whether leading/trailing gaps are free per sequence.
  • score_scale_fn:
    • Preimplemented options:
      • score_scale_factor: default behavior in this module.
      • no_score_scale_factor: disable scaling (always returns 1.0).
    • User-defined option:
      • You can pass your own callable with the same signature as score_scale_factor.
      • The function must accept four indices plus the four *_free flags and return a float scale.

Example custom function:

def my_score_scale_fn(
        seq1_left_idx: int,
        seq1_right_idx: int,
        seq2_left_idx: int,
        seq2_right_idx: int,
        seq1_left_free: bool,
        seq1_right_free: bool,
        seq2_left_free: bool,
        seq2_right_free: bool,
) -> float:
        # Replace with your own domain-specific scaling logic.
        return 1.0


a1, a2, score = needleman_wunsch(
        primer1,
        primer2_rc,
        score_matrix=mat,
        gap_open=-5,
        gap_extend=-1,
        seq1_left_free=seq1_left_free,
        seq1_right_free=seq1_right_free,
        seq2_left_free=seq2_left_free,
        seq2_right_free=seq2_right_free,
        score_scale_fn=my_score_scale_fn,
)

Pairwise features

Supported:

  • Exact DP optimum for two sequences.
  • Affine gap penalties.
  • Per-sequence free left/right ends.
  • Deterministic tie-breaking.
  • Pluggable score scaling callback (score_scale_fn).

Not supported yet:

  • Local alignment (Smith-Waterman).
  • Automatic reverse-complement generation.
  • Built-in ambiguous alphabet handling (for example N) unless you include it in score_matrix.
  • Banding/heuristics for very long sequences.

3D API: needleman_wunsch_3d

Signature (simplified):

needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    seq3_left_free=False,
    seq3_right_free=False,
) -> tuple[str, str, str, float]

Example (dimer + two primers)

from sgad import needleman_wunsch_3d

dimer = "CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC"  # anchored
primer1 = "CCTGCTACTCTGTTCCTTCACATC"  # right free only
primer2_rc = "CTGTTCCCTCAATCTGATAGGTTCC"  # left free only

seq1_left_free = seq1_right_free = False
seq2_left_free, seq2_right_free = False, True
seq3_left_free, seq3_right_free = True, False

mat = {
    "A": {"A": 2, "C": -1, "G": -1, "T": -1},
    "C": {"A": -1, "C": 2, "G": -1, "T": -1},
    "G": {"A": -1, "C": -1, "G": 2, "T": -1},
    "T": {"A": -1, "C": -1, "G": -1, "T": 2},
}

a1, a2, a3, score = needleman_wunsch_3d(
    dimer,
    primer1,
    primer2_rc,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=seq1_left_free,
    seq1_right_free=seq1_right_free,
    seq2_left_free=seq2_left_free,
    seq2_right_free=seq2_right_free,
    seq3_left_free=seq3_left_free,
    seq3_right_free=seq3_right_free,
)

print(a1)
print(a2)
print(a3)
print(score)
# Output:
# CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
# CCTGCTACTCTGTTCCTTCACATC-----------
# ---------CTGTTCCCTCA-ATCTGATAGGTTCC
# 108.0

User-specified values

  • score_matrix for all letters you expect.
  • Affine gap penalties.
  • Left/right free-end settings for each of the 3 sequences.

3D features

Supported:

  • Exact 3-sequence DP optimum (no heuristic shortcuts).
  • Sum-of-pairs substitution scoring.
  • Affine gaps per sequence.
  • Per-sequence free terminal gaps (left/right).
  • Deterministic tie-breaking.

Not supported yet:

  • Pluggable score_scale_fn (3D currently has no scaling callback parameter).
  • Local alignment mode.
  • Banded or heuristic memory/time reduction for long inputs.
  • Automatic sequence preprocessing (reverse-complementing, case-specific cleanup).

Rust Backend

You can use Rust-accelerated implementations for both needleman_wunsch (2D) and needleman_wunsch_3d (3D).

Rust 2D usage

from sgad.rust.pairwise import needleman_wunsch

a1, a2, score = needleman_wunsch(
    seq1,
    seq2,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
)

Rust 2D score scaling options:

  • No scaling by default (score_scale_fn=None).
  • No scaling (score_scale_fn=no_score_scale_factor).
  • Native Rust scaler objects created by make_rust_score_scaler.

Arbitrary Python scaling callables are not supported in the Rust 2D backend.

from sgad.rust.pairwise import make_rust_score_scaler, needleman_wunsch

rust_scaler = make_rust_score_scaler(decay_exponent=1.3, temperature=0.9)

a1, a2, score = needleman_wunsch(
    seq1,
    seq2,
    score_matrix=mat,
    score_scale_fn=rust_scaler,
)

Equivalent direct-native 2D usage:

from sgad.rust.sgad_rust_native import make_rust_score_scaler, needleman_wunsch

rust_scaler = make_rust_score_scaler(decay_exponent=1.3, temperature=0.9)

a1, a2, score = needleman_wunsch(
    seq1,
    seq2,
    score_matrix=mat,
    score_scaler_fn=rust_scaler,
)

Rust 3D usage

from sgad.rust.pairwise_3d import needleman_wunsch_3d

a1, a2, a3, score = needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix=mat,
    gap_open=-5,
    gap_extend=-1,
    seq1_left_free=False,
    seq1_right_free=False,
    seq2_left_free=False,
    seq2_right_free=False,
    seq3_left_free=False,
    seq3_right_free=False,
)

Equivalent direct-native 3D usage:

from sgad.rust.sgad_rust_native import needleman_wunsch_3d

a1, a2, a3, score = needleman_wunsch_3d(
    seq1,
    seq2,
    seq3,
    score_matrix=mat,
)

Benchmark Results

Based on benchmarks/time_complexity.csv, the Rust backend is consistently much faster than the Python implementation for both 2D and 3D exact DP:

  • 2D common-size comparison (n=500..1500) shows about 248x-252x speedup (for example, n=1500: Python 34.84s vs Rust 0.138s).
  • 3D common-size comparison (n=20..100) shows about 233x-282x speedup (for example, n=100: Python 55.66s vs Rust 0.198s).
  • Under the benchmark stopping rules (timeout/memory guard), Python stopped at smaller maximum sizes while Rust continued to larger sizes (2D up to n=6500, 3D up to n=260 in the recorded run).

Benchmarks were run on Ubuntu 22.04.5 LTS (Linux 6.8.0-1044-aws) on an x86_64 machine with an AMD EPYC 7R13 CPU (16 vCPUs, 8 physical cores with SMT, 32 MiB L3) and 123 GiB RAM (no swap), using uv 0.7.15, rustc 1.87.0, and cargo 1.87.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sgad-1.0.1.tar.gz (20.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sgad-1.0.1-cp310-abi3-win_amd64.whl (153.8 kB view details)

Uploaded CPython 3.10+Windows x86-64

sgad-1.0.1-cp310-abi3-manylinux_2_34_x86_64.whl (277.6 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.34+ x86-64

sgad-1.0.1-cp310-abi3-macosx_11_0_arm64.whl (247.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file sgad-1.0.1.tar.gz.

File metadata

  • Download URL: sgad-1.0.1.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.0.1.tar.gz
Algorithm Hash digest
SHA256 9b588d7c185ee4734d47678b260c9efeb3d02ef01e7b2d1421e3c1f69e19cf3a
MD5 b0cdb994d384a823fe39ae99204a4eb5
BLAKE2b-256 e4e23aaf714a411414ace6d4b9db9bfcd69520b9769d556aad9414dafb19e490

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.0.1.tar.gz:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-1.0.1-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: sgad-1.0.1-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 153.8 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.0.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5aaa7599093716e6674d143e56b8d8757d9d4a03fc6cffda3037644a9cee4a92
MD5 81fec61ab6b431da043428a6d75347a3
BLAKE2b-256 dc8ade6858f97533b4eb9a257a208cb960876490383dc9eb77ba210838484942

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.0.1-cp310-abi3-win_amd64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-1.0.1-cp310-abi3-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: sgad-1.0.1-cp310-abi3-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 277.6 kB
  • Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.0.1-cp310-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 edc76b206a01d12b06a8ed1efa48ab5de60976799f962c92477a1b2cf8d676df
MD5 2e286ee5453800daee9f64d63e7e4c1d
BLAKE2b-256 1b3251d1af01c629dde30a336c0231ebde2bd1854d9f9982608a40bbdab87843

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.0.1-cp310-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sgad-1.0.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: sgad-1.0.1-cp310-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 247.3 kB
  • Tags: CPython 3.10+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sgad-1.0.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 af8f85d1f6b690f113123eb8ac341d1d2762b75577683f2ed88462ac18e5827a
MD5 10224478c2dfbc60cb9351e929b3de20
BLAKE2b-256 b3ff05d1b98abc1dd26309501e318a02b4c8fdda68d80411323cb0a1cd73c960

See more details on using hashes here.

Provenance

The following attestation bundles were made for sgad-1.0.1-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: publish.yml on whatever60/sgad

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page