Exact pairwise and 3D sequence alignment utilities
Project description
SGAD
Semi-Global Alignment for Dimer calculation
This repository currently exposes two alignment entry points:
needleman_wunschinsrc/sgad/pairwise.pyfor 2-sequence alignment.needleman_wunsch_3dinsrc/sgad/pairwise_3d.pyfor exact 3-sequence alignment.
Both are global-style dynamic programming aligners with optional free ends (semiglobal behavior when enabled).
Contents
Pairwise API: needleman_wunsch
Signature (simplified):
needleman_wunsch(
seq1,
seq2,
score_matrix,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=False,
score_scale_fn=score_scale_factor,
) -> tuple[str, str, float]
Example (dimer structure prediction)
from sgad import needleman_wunsch, score_scale_factor, to_ascii
primer1 = "GAGATATGAGGAGAGAGAGACAGAGG" # right free only
primer2_rc = "GAACAGAGGGAGAGACTAACCTTG" # left free only
seq1_left_free, seq1_right_free = False, True
seq2_left_free, seq2_right_free = True, False
mat = {
"A": {"A": 2, "C": -1, "G": -1, "T": -1},
"C": {"A": -1, "C": 2, "G": -1, "T": -1},
"G": {"A": -1, "C": -1, "G": 2, "T": -1},
"T": {"A": -1, "C": -1, "G": -1, "T": 2},
}
a1, a2, score = needleman_wunsch(
primer1,
primer2_rc,
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=seq1_left_free,
seq1_right_free=seq1_right_free,
seq2_left_free=seq2_left_free,
seq2_right_free=seq2_right_free,
score_scale_fn=score_scale_factor,
)
print(to_ascii(a1, a2))
print(score)
# Output:
# GAGATATGAGGAGAGAGAGACAGAGG---------------
# || |||||||
# ----------------GA-ACAGAGGGAGAGACTAACCTTG
# 8.974206349206348
User-specified values
score_matrix: your substitution model.gap_openandgap_extend: affine gap penalties.- Free-end flags: whether leading/trailing gaps are free per sequence.
score_scale_fn:- Preimplemented options:
score_scale_factor: default behavior in this module.no_score_scale_factor: disable scaling (always returns1.0).
- User-defined option:
- You can pass your own callable with the same signature as
score_scale_factor. - The function must accept four indices plus the four
*_freeflags and return a float scale.
- You can pass your own callable with the same signature as
- Preimplemented options:
Example custom function:
def my_score_scale_fn(
seq1_left_idx: int,
seq1_right_idx: int,
seq2_left_idx: int,
seq2_right_idx: int,
seq1_left_free: bool,
seq1_right_free: bool,
seq2_left_free: bool,
seq2_right_free: bool,
) -> float:
# Replace with your own domain-specific scaling logic.
return 1.0
a1, a2, score = needleman_wunsch(
primer1,
primer2_rc,
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=seq1_left_free,
seq1_right_free=seq1_right_free,
seq2_left_free=seq2_left_free,
seq2_right_free=seq2_right_free,
score_scale_fn=my_score_scale_fn,
)
Pairwise features
Supported:
- Exact DP optimum for two sequences.
- Affine gap penalties.
- Per-sequence free left/right ends.
- Deterministic tie-breaking.
- Pluggable score scaling callback (
score_scale_fn).
Not supported yet:
- Local alignment (Smith-Waterman).
- Automatic reverse-complement generation.
- Built-in ambiguous alphabet handling (for example
N) unless you include it inscore_matrix. - Banding/heuristics for very long sequences.
3D API: needleman_wunsch_3d
Signature (simplified):
needleman_wunsch_3d(
seq1,
seq2,
seq3,
score_matrix,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=False,
seq3_left_free=False,
seq3_right_free=False,
) -> tuple[str, str, str, float]
Example (dimer + two primers)
from sgad import needleman_wunsch_3d
dimer = "CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC" # anchored
primer1 = "CCTGCTACTCTGTTCCTTCACATC" # right free only
primer2_rc = "CTGTTCCCTCAATCTGATAGGTTCC" # left free only
seq1_left_free = seq1_right_free = False
seq2_left_free, seq2_right_free = False, True
seq3_left_free, seq3_right_free = True, False
mat = {
"A": {"A": 2, "C": -1, "G": -1, "T": -1},
"C": {"A": -1, "C": 2, "G": -1, "T": -1},
"G": {"A": -1, "C": -1, "G": 2, "T": -1},
"T": {"A": -1, "C": -1, "G": -1, "T": 2},
}
a1, a2, a3, score = needleman_wunsch_3d(
dimer,
primer1,
primer2_rc,
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=seq1_left_free,
seq1_right_free=seq1_right_free,
seq2_left_free=seq2_left_free,
seq2_right_free=seq2_right_free,
seq3_left_free=seq3_left_free,
seq3_right_free=seq3_right_free,
)
print(a1)
print(a2)
print(a3)
print(score)
# Output:
# CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
# CCTGCTACTCTGTTCCTTCACATC-----------
# ---------CTGTTCCCTCA-ATCTGATAGGTTCC
# 108.0
User-specified values
score_matrixfor all letters you expect.- Affine gap penalties.
- Left/right free-end settings for each of the 3 sequences.
3D features
Supported:
- Exact 3-sequence DP optimum (no heuristic shortcuts).
- Sum-of-pairs substitution scoring.
- Affine gaps per sequence.
- Per-sequence free terminal gaps (left/right).
- Deterministic tie-breaking.
Not supported yet:
- Pluggable
score_scale_fn(3D currently has no scaling callback parameter). - Local alignment mode.
- Banded or heuristic memory/time reduction for long inputs.
- Automatic sequence preprocessing (reverse-complementing, case-specific cleanup).
Rust Backend
You can use Rust-accelerated implementations for both needleman_wunsch (2D) and
needleman_wunsch_3d (3D).
Rust 2D usage
from sgad.rust.pairwise import needleman_wunsch
a1, a2, score = needleman_wunsch(
seq1,
seq2,
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
)
Rust 2D score scaling options:
- No scaling by default (
score_scale_fn=None). - No scaling (
score_scale_fn=no_score_scale_factor). - Native Rust scaler objects created by
make_rust_score_scaler.
Arbitrary Python scaling callables are not supported in the Rust 2D backend.
from sgad.rust.pairwise import make_rust_score_scaler, needleman_wunsch
rust_scaler = make_rust_score_scaler(decay_exponent=1.3, temperature=0.9)
a1, a2, score = needleman_wunsch(
seq1,
seq2,
score_matrix=mat,
score_scale_fn=rust_scaler,
)
Equivalent direct-native 2D usage:
from sgad.rust.sgad_rust_native import make_rust_score_scaler, needleman_wunsch
rust_scaler = make_rust_score_scaler(decay_exponent=1.3, temperature=0.9)
a1, a2, score = needleman_wunsch(
seq1,
seq2,
score_matrix=mat,
score_scaler_fn=rust_scaler,
)
Rust 3D usage
from sgad.rust.pairwise_3d import needleman_wunsch_3d
a1, a2, a3, score = needleman_wunsch_3d(
seq1,
seq2,
seq3,
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=False,
seq3_left_free=False,
seq3_right_free=False,
)
Equivalent direct-native 3D usage:
from sgad.rust.sgad_rust_native import needleman_wunsch_3d
a1, a2, a3, score = needleman_wunsch_3d(
seq1,
seq2,
seq3,
score_matrix=mat,
)
Benchmark Results
Based on benchmarks/time_complexity.csv, the Rust backend is consistently much faster
than the Python implementation for both 2D and 3D exact DP:
- 2D common-size comparison (
n=500..1500) shows about248x-252xspeedup (for example,n=1500: Python34.84svs Rust0.138s). - 3D common-size comparison (
n=20..100) shows about233x-282xspeedup (for example,n=100: Python55.66svs Rust0.198s). - Under the benchmark stopping rules (timeout/memory guard), Python stopped at smaller
maximum sizes while Rust continued to larger sizes (
2Dup ton=6500,3Dup ton=260in the recorded run).
Benchmarks were run on Ubuntu 22.04.5 LTS (Linux 6.8.0-1044-aws) on an x86_64
machine with an AMD EPYC 7R13 CPU (16 vCPUs, 8 physical cores with SMT, 32 MiB L3)
and 123 GiB RAM (no swap), using uv 0.7.15, rustc 1.87.0, and cargo 1.87.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sgad-0.1.0.tar.gz.
File metadata
- Download URL: sgad-0.1.0.tar.gz
- Upload date:
- Size: 20.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f067cd0707ea7ad8f3e922354a388b03683edfd97b46085f4434493c9242b4be
|
|
| MD5 |
67b1e8e69ca28552edb5976f1270622a
|
|
| BLAKE2b-256 |
1995cdff184f2a61ae3e75054f832b1262ca0f202df64d0e8034501481a66d6f
|
Provenance
The following attestation bundles were made for sgad-0.1.0.tar.gz:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-0.1.0.tar.gz -
Subject digest:
f067cd0707ea7ad8f3e922354a388b03683edfd97b46085f4434493c9242b4be - Sigstore transparency entry: 1085082673
- Sigstore integration time:
-
Permalink:
whatever60/sgad@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Trigger Event:
push
-
Statement type:
File details
Details for the file sgad-0.1.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: sgad-0.1.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 153.9 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8177f7051abdade5d5450282b2ce613a20355323111449f9489d9cf4b1cc54f2
|
|
| MD5 |
ddf36d8697d6f56eb10ff2ce7aac56c2
|
|
| BLAKE2b-256 |
c10349bc6f0a406ff68648ead426e7eafc246db53e96354477d870cefe7e1d3c
|
Provenance
The following attestation bundles were made for sgad-0.1.0-cp310-abi3-win_amd64.whl:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-0.1.0-cp310-abi3-win_amd64.whl -
Subject digest:
8177f7051abdade5d5450282b2ce613a20355323111449f9489d9cf4b1cc54f2 - Sigstore transparency entry: 1085082702
- Sigstore integration time:
-
Permalink:
whatever60/sgad@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Trigger Event:
push
-
Statement type:
File details
Details for the file sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 277.9 kB
- Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7be2d650645f528c8d09620edb61dcec2b3ce2cfb6e93e253afc951d0300f72
|
|
| MD5 |
8372ca218e1eb4f4da9fe43a7dd92810
|
|
| BLAKE2b-256 |
28872bf17e7fef4745aa50bc14904937059ac2c13536240d1afec20256535768
|
Provenance
The following attestation bundles were made for sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-0.1.0-cp310-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
b7be2d650645f528c8d09620edb61dcec2b3ce2cfb6e93e253afc951d0300f72 - Sigstore transparency entry: 1085082729
- Sigstore integration time:
-
Permalink:
whatever60/sgad@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Trigger Event:
push
-
Statement type:
File details
Details for the file sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 247.6 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
647115cdf743416fe3f5542f8a961739e0c389f109bcccd7d33588f4ba30eabb
|
|
| MD5 |
1ca3242a96a778639eb3a1e1d2c3d7a2
|
|
| BLAKE2b-256 |
f8283d4a89659e5e7ca754a1ab570141cc6e83da04005ac0ec0184ce94056d3b
|
Provenance
The following attestation bundles were made for sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-0.1.0-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
647115cdf743416fe3f5542f8a961739e0c389f109bcccd7d33588f4ba30eabb - Sigstore transparency entry: 1085082765
- Sigstore integration time:
-
Permalink:
whatever60/sgad@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Branch / Tag:
refs/tags/v1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bf967374d0d937278ee2d65e4cf27b54fdacc63a -
Trigger Event:
push
-
Statement type: