Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms in Rust with Python bindings.
Project description
sequence_align
Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.
Installation
sequence_align
is distributed via PyPi for Python 3.7+, making installation as simple as the following --
no special setup required for cross-platform compatibility, Rust installation, etc.!
pip install sequence_align
Alternatively, if one wishes to develop for sequence_align
, first ensure that both
Python and Rust
are installed on your system. Then, install Maturin and run
maturin develop
(optionally with the -r
flag to compile a release build, instead of an unoptimized debug build)
from the root of your cloned repo to build and install sequence_align
in your active Python environment.
Quick Start
Pairwise sequence algorithms are available in sequence_align.pairwise.
Currently, two algorithms are implemented: the Needleman-Wunsch algorithm
and Hirschberg’s algorithm. Needleman-Wunsch is
commonly used for global sequence alignment, but suffers from the fact that it uses O(M*N)
space,
where M
and N
are the lengths of the two sequences being aligned. Hirschberg’s algorithm modifies Needleman-Wunsch
to have the same time complexity (O(M*N)
), but only use O(min{M, N})
space, making it an appealing option
for memory-limited applications or extremely large sequences.
One may also compute the Needleman-Wunsch alignment score for alignments produced by either algorithm using sequence_align.pairwise.alignment_score.
Using these algorithms is straightforward:
from sequence_align.pairwise import alignment_score, hirschberg, needleman_wunsch
# See https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm#/media/File:Needleman-Wunsch_pairwise_sequence_alignment.png
# Use Needleman-Wunsch default scores (match=1, mismatch=-1, indel=-1)
seq_a = ["G", "A", "T", "T", "A", "C", "A"]
seq_b = ["G", "C", "A", "T", "G", "C", "G"]
aligned_seq_a, aligned_seq_b = needleman_wunsch(
seq_a,
seq_b,
match_score=1.0,
mismatch_score=-1.0,
indel_score=-1.0,
gap="_",
)
# Expects ["G", "_", "A", "T", "T", "A", "C", "A"]
print(aligned_seq_a)
# Expects ["G", "C", "A", "_", "T", "G", "C", "G"]
print(aligned_seq_b)
# Expects 0
score = alignment_score(
aligned_seq_a,
aligned_seq_b,
match_score=1.0,
mismatch_score=-1.0,
indel_score=-1.0,
gap="_",
)
print(score)
# See https://en.wikipedia.org/wiki/Hirschberg%27s_algorithm#Example
seq_a = ["A", "G", "T", "A", "C", "G", "C", "A"]
seq_b = ["T", "A", "T", "G", "C"]
aligned_seq_a, aligned_seq_b = hirschberg(
seq_a,
seq_b,
match_score=2.0,
mismatch_score=-1.0,
indel_score=-2.0,
gap="_",
)
# Expects ["A", "G", "T", "A", "C", "G", "C", "A"]
print(aligned_seq_a)
# Expects ["_", "_", "T", "A", "T", "G", "C", "_"]
print(aligned_seq_b)
# Expects 1
score = alignment_score(
aligned_seq_a,
aligned_seq_b,
match_score=2.0,
mismatch_score=-1.0,
indel_score=-2.0,
gap="_",
)
print(score)
Performance Benchmarks
All tests below were conducted sequentially on a AWS R5.4 instance with 16 cores and 128 GB of memory. The pair of sequences for alignment consist of a character sequence of randomly selected A/C/G/T nucleotide bases along with another that is identical, except with 10% of the characters randomly perturbed by deletion, insertion of another randomly-selected character after the entry, or replacement with a different randomly-selected character.
As one can see, while sequence_align
is comparable to some other toolkits in terms of speed, its memory performance
is best-in-class, even when compared to toolkits using the same algorithm, such as Needleman-Wunsch being used in
pyseq-align
.
(Please note that some lines terminate early, as some toolkits took prohibitively long and/or ran out of memory at higher scales.)
License
Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Copyright 2023-present Kensho Technologies, LLC. The present date is determined by the timestamp of the most recent commit in the repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file sequence_align-0.2.0-cp37-abi3-win_amd64.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-win_amd64.whl
- Upload date:
- Size: 111.8 kB
- Tags: CPython 3.7+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2039bc90d442d0000271f0153d708a82f1412b763c2a5361032e191612b42473 |
|
MD5 | d0f32e5c7d4a6b384f64e148b051477b |
|
BLAKE2b-256 | 269f32b426402369c68206a805870908d9d76c53b2f9450565f42457869cae7c |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-win32.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-win32.whl
- Upload date:
- Size: 106.2 kB
- Tags: CPython 3.7+, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f35b9b377f8c7a2010c86a960de174e9a1135ac1ed42d8f4816ea52c3161aa4a |
|
MD5 | 6eb35a9cda08b0ed7877e207a7cbc1bd |
|
BLAKE2b-256 | 8715be9cca6fe0a55a115d3b2c9f42af61e1e43f90c03cec4b26e2d078fcdcf1 |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 220.7 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a48c94fef585d204b679dbe9441130426e2aad11e79615848e9c60c1214d43b3 |
|
MD5 | 92343e183e4a018f7401bc19861ec753 |
|
BLAKE2b-256 | b697dc72f9ce7594f2a0c0056f24de73b1f5faf5db9bed94404174ce8029ddee |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
- Upload date:
- Size: 287.0 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ s390x
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08496cafb2422582346c63f7c45cb563104fa0f9b6312d1f3ae60d2583deef00 |
|
MD5 | 7983590e00d11b8145e5d12b2b891592 |
|
BLAKE2b-256 | 61d78eea32e6aab63d49a8623ae38e2834168c256be389331c81d729cff80db7 |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
- Upload date:
- Size: 240.2 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ppc64le
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38eeca6af12c6b85d40239c717017e201c591e29cf480755420790222a0ccdae |
|
MD5 | 637bc7a49de6b490fade2958a96991af |
|
BLAKE2b-256 | 1429e2fef31dd33c3f5c4befacbebe5897c30996439cc4951277839e5175354f |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
- Upload date:
- Size: 226.1 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARMv7l
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5b499d0ae0f351934fccfd9225cb0523704f65411f72e3400a7381589703baf |
|
MD5 | 310328b6a97e19cc47bd466411215a62 |
|
BLAKE2b-256 | a11730431f05201080dec87050ca6232077a44c16327fec46f6ebf9fe7b2a7a6 |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 224.3 kB
- Tags: CPython 3.7+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc03e07bb1701d1af6472084f27c6ddbc2114ecfd17aca4fdd987d1249499776 |
|
MD5 | 114ea3d762415fab5d0c7864e9b3702f |
|
BLAKE2b-256 | 2bed2f6f8cfc97c6bc763aa9a9cc587e0f209be07f4f7e8b24238298d2dd4bd8 |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl
- Upload date:
- Size: 224.3 kB
- Tags: CPython 3.7+, manylinux: glibc 2.5+ i686
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 329fb118ab662e88580146a7ce76dc854f29f4fd2edbc3883839832be7c1886d |
|
MD5 | 44a0b5bdbed9ff917db5584fdea647d2 |
|
BLAKE2b-256 | e07338122ca6555bf8144e5b46170552c7a0295931c4c776a106e3e7fbef9c4c |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 193.5 kB
- Tags: CPython 3.7+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d9f6db93acad4a8468daae182b3e314238739fb9dfa5e9572220595d435c106 |
|
MD5 | a38ce86840dea8d85f97ccc42f285686 |
|
BLAKE2b-256 | e0d5dc30712437ce448118bdf66f5cd43cdf6a6eb4119e993feda261c4317db2 |
File details
Details for the file sequence_align-0.2.0-cp37-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: sequence_align-0.2.0-cp37-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 201.1 kB
- Tags: CPython 3.7+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac96ba6ef225612663b70bd21f432931ae9fd4fbe718eda2aa489eca3ad13d94 |
|
MD5 | b775b4dc412090010040e179108a3d75 |
|
BLAKE2b-256 | 715feed984897950aad4ddc2902f7944ce2f918bd94451386a5e71d45cb1a005 |