Exact pairwise and 3D sequence alignment utilities
Project description
SGAD
Semi-Global Alignment for Dimer calculation.
SGAD provides exact Needleman-Wunsch dynamic programming for 2-sequence and 3-sequence alignment, with symmetry-aware affine-gap scoring.
Core alignment controls in this package:
- Four free-end flags (semiglobal behavior):
seq1_left_free,seq1_right_free,seq2_left_free,seq2_right_free(and seq3 variants in 3D). - Position-biased / weighted scoring via
score_scale_fn(2D API only). - Gap-close penalty via
enable_gap_close_penalty(defaultTrue), which splits affine open-vs-extend delta into open + close terms. This improves reverse/swap symmetry behavior and keeps DP score consistent with rescoring. Complement score invariance additionally requires a complement-symmetric substitution matrix.
New in v1.1.0
- Added symmetry-aware gap-close handling in Python and Rust 2D/3D aligners, with score parity between DP-reported score and rescored final alignment.
- Added analysis/verification scripts for:
- DP-vs-rescore consistency (2D/3D),
- symmetry sweeps (swap/reverse/complement),
- Python-vs-Rust alignment/score parity.
- Added high-level interfaces for external dimer assessment libraries:
Primer3+
ntthalbatch analysis and IDT OligoAnalyzer batch integration.
2D Needleman-Wunsch
needleman_wunsch(
seq1,
seq2,
score_matrix,
gap_open=-5,
gap_extend=-1,
enable_gap_close_penalty=True,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=False,
score_scale_fn=score_scale_factor,
) -> tuple[str, str, float]
Example (Python 2D)
from sgad.pairwise import needleman_wunsch, score_scale_factor, to_ascii
mat = {
"A": {"A": 2, "C": -1, "G": -1, "T": -1},
"C": {"A": -1, "C": 2, "G": -1, "T": -1},
"G": {"A": -1, "C": -1, "G": 2, "T": -1},
"T": {"A": -1, "C": -1, "G": -1, "T": 2},
}
a1, a2, score = needleman_wunsch(
"GAGATATGAGGAGAGAGAGACAGAGG",
"GAACAGAGGGAGAGACTAACCTTG",
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=True,
seq2_left_free=True,
seq2_right_free=False,
score_scale_fn=score_scale_factor,
)
print(to_ascii(a1, a2, False, True, True, False))
print(score)
Output:
GAGATATGAGGAGAGAGAGACAGAGG
|| |||||||
GA-ACAGAGGGAGAGACTAACCTTG
8.97420634920635
User-specified arguments
score_matrix: substitution model.gap_open,gap_extend: affine-gap parameters.enable_gap_close_penalty: toggle split open/close gap accounting.- Four free-end flags: semiglobal boundary behavior per sequence side.
score_scale_fn: per-column weighting callback.- Use
no_score_scale_factorto disable weighting. - Use
make_score_scaler(...)for configurable inverse-distance weighting.
- Use
Rust 2D usage
Rust wrapper API is under sgad.rust. score_scale_fn must be either None
or a RustScoreScaler object from make_rust_score_scaler(...).
from sgad.pairwise import to_ascii
from sgad.rust.pairwise import make_rust_score_scaler, needleman_wunsch
mat = {
"A": {"A": 2, "C": -1, "G": -1, "T": -1},
"C": {"A": -1, "C": 2, "G": -1, "T": -1},
"G": {"A": -1, "C": -1, "G": 2, "T": -1},
"T": {"A": -1, "C": -1, "G": -1, "T": 2},
}
rust_scaler = make_rust_score_scaler(decay_exponent=1.0, temperature=1.0)
a1, a2, score = needleman_wunsch(
"GAGATATGAGGAGAGAGAGACAGAGG",
"GAACAGAGGGAGAGACTAACCTTG",
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=True,
seq2_left_free=True,
seq2_right_free=False,
enable_gap_close_penalty=True,
score_scale_fn=rust_scaler,
)
print(to_ascii(a1, a2, False, True, True, False))
print(score)
Output:
GAGATATGAGGAGAGAGAGACAGAGG
|| |||||||
GA-ACAGAGGGAGAGACTAACCTTG
8.97420634920635
Rust 2D multiprocessing caveat
RustScoreScaler objects are not picklable. In process-based parallelism
(multiprocessing, joblib loky), build the scaler inside each worker (or use
score_scale_fn=None). Thread-based execution avoids this serialization issue.
3D Needleman-Wunsch
Important: 3D currently does not expose score scaling (score_scale_fn).
needleman_wunsch_3d(
seq1,
seq2,
seq3,
score_matrix,
gap_open=-5,
gap_extend=-1,
enable_gap_close_penalty=True,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=False,
seq3_left_free=False,
seq3_right_free=False,
) -> tuple[str, str, str, float]
Example (Python 3D)
from sgad.pairwise_3d import needleman_wunsch_3d
mat = {
"A": {"A": 2, "C": -1, "G": -1, "T": -1},
"C": {"A": -1, "C": 2, "G": -1, "T": -1},
"G": {"A": -1, "C": -1, "G": 2, "T": -1},
"T": {"A": -1, "C": -1, "G": -1, "T": 2},
}
a1, a2, a3, score = needleman_wunsch_3d(
"CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC",
"CCTGCTACTCTGTTCCTTCACATC",
"CTGTTCCCTCAATCTGATAGGTTCC",
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=True,
seq3_left_free=True,
seq3_right_free=False,
)
print(a1)
print(a2)
print(a3)
print(score)
Output:
CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
CCTGCTACTCTGTTCCTTCACATC-----------
---------CTGTTCCCTCA-ATCTGATAGGTTCC
108.0
User-specified arguments
score_matrix: substitution model (sum-of-pairs scoring in 3D).gap_open,gap_extend: affine-gap parameters.enable_gap_close_penalty: toggle split open/close gap accounting.- Six free-end flags: semiglobal boundary behavior for all sequence sides.
Rust 3D usage
from sgad.rust.pairwise_3d import needleman_wunsch_3d
mat = {
"A": {"A": 2, "C": -1, "G": -1, "T": -1},
"C": {"A": -1, "C": 2, "G": -1, "T": -1},
"G": {"A": -1, "C": -1, "G": 2, "T": -1},
"T": {"A": -1, "C": -1, "G": -1, "T": 2},
}
a1, a2, a3, score = needleman_wunsch_3d(
"CCTGCTACTCTGTTCCCTCAATCTGATAGGTTCC",
"CCTGCTACTCTGTTCCTTCACATC",
"CTGTTCCCTCAATCTGATAGGTTCC",
score_matrix=mat,
gap_open=-5,
gap_extend=-1,
seq1_left_free=False,
seq1_right_free=False,
seq2_left_free=False,
seq2_right_free=True,
seq3_left_free=True,
seq3_right_free=False,
enable_gap_close_penalty=True,
)
print(a1)
print(a2)
print(a3)
print(score)
Output:
CCTGCTACTCTGTTCCCTCA-ATCTGATAGGTTCC
CCTGCTACTCTGTTCCTTCACATC-----------
---------CTGTTCCCTCA-ATCTGATAGGTTCC
108.0
Rust 3D multiprocessing caveat
Unlike Rust 2D, there is no scaler object to serialize. For process pools, ensure worker-call arguments remain picklable and import the Rust module in the worker runtime as usual.
Benchmarking Python vs Rust backends
Based on benchmarks/time_complexity.csv, the Rust backend is consistently much faster
than the Python implementation for both 2D and 3D exact DP:
- 2D common-size comparison (
n=500..1500) shows about248x-252xspeedup (for example,n=1500: Python34.84svs Rust0.138s). - 3D common-size comparison (
n=20..100) shows about233x-282xspeedup (for example,n=100: Python55.66svs Rust0.198s). - Under the benchmark stopping rules (timeout/memory guard), Python stopped at smaller
maximum sizes while Rust continued to larger sizes (
2Dup ton=6500,3Dup ton=260in the recorded run).
Benchmarks were run on Ubuntu 22.04.5 LTS (Linux 6.8.0-1044-aws) on an x86_64
machine with an AMD EPYC 7R13 CPU (16 vCPUs, 8 physical cores with SMT, 32 MiB L3)
and 123 GiB RAM (no swap), using uv 0.7.15, rustc 1.87.0, and cargo 1.87.0.
Interface to external dimer assessment libraries
Primer3 interface
from sgad.api.primer3 import heterodimer_batch_primer3
df = heterodimer_batch_primer3(
primer1_seqs=["ACGTACGT"],
primer2_seqs=["TGCATGCA"],
primer1_names=["fwd_1"],
primer2_names=["rev_1"],
n_jobs=1,
)
print(df[["primer1_name", "primer2_name", "primer3_tm", "ntthal_t"]].to_string(index=False))
Output:
primer1_name primer2_name primer3_tm ntthal_t
fwd_1 rev_1 -70.205833 -70.2058
IDT OligoAnalyzer interface
from sgad.api.idt import heterodimer_batch_idt
res = heterodimer_batch_idt(
primer1_seqs=["ACGTACGT"],
primer2_seqs=["TGCATGCA"],
primer1_names=["fwd_1"],
primer2_names=["rev_1"],
client_id="invalid",
client_secret="invalid",
idt_username="invalid",
idt_password="invalid",
timeout_s=5.0,
max_retries=1,
raise_on_error=False,
)
print(res[0])
This example intentionally uses invalid credentials to show the failure-record
shape returned when raise_on_error=False.
Output:
{'primer1_name': 'fwd_1', 'primer2_name': 'rev_1', 'primer1': 'ACGTACGT', 'primer2': 'TGCATGCA', 'ok': False, 'response': None, 'status_code': 400, 'error': '400 Client Error: Bad Request for url: https://www.idtdna.com/Identityserver/connect/token'}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sgad-1.1.0.tar.gz.
File metadata
- Download URL: sgad-1.1.0.tar.gz
- Upload date:
- Size: 42.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d8f8e2500c2e4db0ce9e9759d4e4bfa83e832865ce9dd40c8e1f39c929850a1
|
|
| MD5 |
8b7a6e04e6323252f2202b39bce3262d
|
|
| BLAKE2b-256 |
daf0546a29ed6b76c3d469bdb1ddb88bbae1381d004997f9cc26d2109bc15050
|
Provenance
The following attestation bundles were made for sgad-1.1.0.tar.gz:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-1.1.0.tar.gz -
Subject digest:
6d8f8e2500c2e4db0ce9e9759d4e4bfa83e832865ce9dd40c8e1f39c929850a1 - Sigstore transparency entry: 1117986901
- Sigstore integration time:
-
Permalink:
whatever60/sgad@551e7aca3bb327585cf6720484cff297b69b882c -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@551e7aca3bb327585cf6720484cff297b69b882c -
Trigger Event:
push
-
Statement type:
File details
Details for the file sgad-1.1.0-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: sgad-1.1.0-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 179.7 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
211686435aafa1d671525e283b5d4036463a91dc4c23b192127ef2905de06fcc
|
|
| MD5 |
b8ad73ac596b9940816b293a94985818
|
|
| BLAKE2b-256 |
0204445c45a831f0870719351d836f42a25cdeb0996eceb483ae5fe236220b8a
|
Provenance
The following attestation bundles were made for sgad-1.1.0-cp310-abi3-win_amd64.whl:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-1.1.0-cp310-abi3-win_amd64.whl -
Subject digest:
211686435aafa1d671525e283b5d4036463a91dc4c23b192127ef2905de06fcc - Sigstore transparency entry: 1117986944
- Sigstore integration time:
-
Permalink:
whatever60/sgad@551e7aca3bb327585cf6720484cff297b69b882c -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@551e7aca3bb327585cf6720484cff297b69b882c -
Trigger Event:
push
-
Statement type:
File details
Details for the file sgad-1.1.0-cp310-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: sgad-1.1.0-cp310-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 303.1 kB
- Tags: CPython 3.10+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86d5365f12a94df2e2266115d9f67437bc93af0683d43d58403381b1e4de625b
|
|
| MD5 |
0be499eaa90cba3c266ee9b8981d397e
|
|
| BLAKE2b-256 |
ef12b77e06d756b6663f7113b4cc9a8e05ea11569e150210279310a080f12ff8
|
Provenance
The following attestation bundles were made for sgad-1.1.0-cp310-abi3-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-1.1.0-cp310-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
86d5365f12a94df2e2266115d9f67437bc93af0683d43d58403381b1e4de625b - Sigstore transparency entry: 1117986952
- Sigstore integration time:
-
Permalink:
whatever60/sgad@551e7aca3bb327585cf6720484cff297b69b882c -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@551e7aca3bb327585cf6720484cff297b69b882c -
Trigger Event:
push
-
Statement type:
File details
Details for the file sgad-1.1.0-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: sgad-1.1.0-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 272.5 kB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fada5f8dbad6a66b723348e6a6f6b4c0459238b58fa56cea6c7b295400b82560
|
|
| MD5 |
52109c3fda25e5767eb9435bfd5b76f1
|
|
| BLAKE2b-256 |
adaf6f3f7fb5dac9722e58b41e9752eaf9fe037b5218a978e731a3a2c2ffadbb
|
Provenance
The following attestation bundles were made for sgad-1.1.0-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
publish.yml on whatever60/sgad
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sgad-1.1.0-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
fada5f8dbad6a66b723348e6a6f6b4c0459238b58fa56cea6c7b295400b82560 - Sigstore transparency entry: 1117986928
- Sigstore integration time:
-
Permalink:
whatever60/sgad@551e7aca3bb327585cf6720484cff297b69b882c -
Branch / Tag:
refs/tags/v1.1.0 - Owner: https://github.com/whatever60
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@551e7aca3bb327585cf6720484cff297b69b882c -
Trigger Event:
push
-
Statement type: