Skip to main content

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms in Rust with Python bindings.

Project description

sequence_align

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.

Installation

sequence_align is distributed via PyPi for Python 3.9 - 3.13, making installation as simple as the following -- no special setup required for cross-platform compatibility, Rust installation, etc.!

pip install sequence_align

Alternatively, if one wishes to develop for sequence_align, first ensure that both Python and Rust are installed on your system. Then, install Maturin and run maturin develop (optionally with the -r flag to compile a release build, instead of an unoptimized debug build) from the root of your cloned repo to build and install sequence_align in your active Python environment.

Quick Start

Pairwise sequence algorithms are available in sequence_align.pairwise. Currently, two algorithms are implemented: the Needleman-Wunsch algorithm and Hirschberg’s algorithm. Needleman-Wunsch is commonly used for global sequence alignment, but suffers from the fact that it uses O(M*N) space, where M and N are the lengths of the two sequences being aligned. Hirschberg’s algorithm modifies Needleman-Wunsch to have the same time complexity (O(M*N)), but only use O(min{M, N}) space, making it an appealing option for memory-limited applications or extremely large sequences.

One may also compute the Needleman-Wunsch alignment score for alignments produced by either algorithm using sequence_align.pairwise.alignment_score.

Using these algorithms is straightforward:

from sequence_align.pairwise import alignment_score, hirschberg, needleman_wunsch


# See https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm#/media/File:Needleman-Wunsch_pairwise_sequence_alignment.png
# Use Needleman-Wunsch default scores (match=1, mismatch=-1, indel=-1)
seq_a = ["G", "A", "T", "T", "A", "C", "A"]
seq_b = ["G", "C", "A", "T", "G", "C", "G"]

aligned_seq_a, aligned_seq_b = needleman_wunsch(
    seq_a,
    seq_b,
    match_score=1.0,
    mismatch_score=-1.0,
    indel_score=-1.0,
    gap="_",
)

# Expects ["G", "_", "A", "T", "T", "A", "C", "A"]
print(aligned_seq_a)

# Expects ["G", "C", "A", "_", "T", "G", "C", "G"]
print(aligned_seq_b)

# Expects 0
score = alignment_score(
    aligned_seq_a,
    aligned_seq_b,
    match_score=1.0,
    mismatch_score=-1.0,
    indel_score=-1.0,
    gap="_",
)
print(score)


# See https://en.wikipedia.org/wiki/Hirschberg%27s_algorithm#Example
seq_a = ["A", "G", "T", "A", "C", "G", "C", "A"]
seq_b = ["T", "A", "T", "G", "C"]

aligned_seq_a, aligned_seq_b = hirschberg(
    seq_a,
    seq_b,
    match_score=2.0,
    mismatch_score=-1.0,
    indel_score=-2.0,
    gap="_",
)

# Expects ["A", "G", "T", "A", "C", "G", "C", "A"]
print(aligned_seq_a)

# Expects ["_", "_", "T", "A", "T", "G", "C", "_"]
print(aligned_seq_b)

# Expects 1
score = alignment_score(
    aligned_seq_a,
    aligned_seq_b,
    match_score=2.0,
    mismatch_score=-1.0,
    indel_score=-2.0,
    gap="_",
)
print(score)

Performance Benchmarks

All tests below were conducted sequentially on a AWS R5.4 instance with 16 cores and 128 GB of memory. The pair of sequences for alignment consist of a character sequence of randomly selected A/C/G/T nucleotide bases along with another that is identical, except with 10% of the characters randomly perturbed by deletion, insertion of another randomly-selected character after the entry, or replacement with a different randomly-selected character.

As one can see, while sequence_align is comparable to some other toolkits in terms of speed, its memory performance is best-in-class, even when compared to toolkits using the same algorithm, such as Needleman-Wunsch being used in pyseq-align.

(Please note that some lines terminate early, as some toolkits took prohibitively long and/or ran out of memory at higher scales.)

License

Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2023-present Kensho Technologies, LLC. The present date is determined by the timestamp of the most recent commit in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sequence_align-0.3.0-cp37-abi3-win_amd64.whl (114.4 kB view details)

Uploaded CPython 3.7+Windows x86-64

sequence_align-0.3.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (229.4 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ x86-64

sequence_align-0.3.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (299.6 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ s390x

sequence_align-0.3.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (255.4 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ppc64le

sequence_align-0.3.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (236.0 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARMv7l

sequence_align-0.3.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (224.4 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.17+ ARM64

sequence_align-0.3.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl (238.7 kB view details)

Uploaded CPython 3.7+manylinux: glibc 2.5+ i686

sequence_align-0.3.0-cp37-abi3-macosx_11_0_arm64.whl (200.4 kB view details)

Uploaded CPython 3.7+macOS 11.0+ ARM64

sequence_align-0.3.0-cp37-abi3-macosx_10_12_x86_64.whl (211.3 kB view details)

Uploaded CPython 3.7+macOS 10.12+ x86-64

File details

Details for the file sequence_align-0.3.0-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 83b42c623b60b6b6fa0aff793c8cefc2b8901146fc165ca5174aef45dc25e17a
MD5 f2ae3287985424ab2b9c312b619aa434
BLAKE2b-256 6a9e400cf59939c3a35838867ade32c4bb7abea70b372b51a1933728ee713744

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 51d6f60485e75021fa959c8833a5ce968a200fcc67e121ad156cef6475db748b
MD5 d85223fc83100cb5bc2fbe7046e9ac9e
BLAKE2b-256 945498eb40f66b89e04cbe7d057a5e29ef0cc454434191961879fd0127d82063

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 98615ababc34e426edefb2fb7b621766d3ddb40357ebf8fc6b4995cc65a93696
MD5 660b4dc894dc544f46dea0c4bb681f9b
BLAKE2b-256 dd1159c45274b601fb7d158153a6f52ec66c828f43f62e01a73f8f5be1ea4258

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 74d63b891577a2d909e03bf32db72a143f761d1cee68442ab622fd13384c89fb
MD5 02107604f4c05a617887ef313ec9ba86
BLAKE2b-256 95907df50ea487cdd608e21658313caa4f629b1d0f34841d628a7bfed1784660

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 2805dc6198a48f73e5dd47ff1ab18a8b9dd37ab8990af0534267579e7c554de3
MD5 ea9dcd7c351b7f66152d1724071ecf87
BLAKE2b-256 6f56f41c97d28eae018d90d7d5eff0381868e48fca1d85335944a3dff5e34d53

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 20161ae5d3e8606d0cbbf4b9cc49555d1585092a9009283d5974bdc8bda4e26f
MD5 4e8d798f597b3846c1215f3a0e5fd917
BLAKE2b-256 9275a0750e71ddc63561ebfc028255a15302cf68b4068f55a63ea9f449051590

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 e87ed5e908ce3e4fa0ee55484d507c7b1906f0c29c820c3579293c7cda8f7826
MD5 05ca591b3a2f76cc336a960b4a80c744
BLAKE2b-256 cca646f275ddc43a45956f9722590b21f9a39f25c029a235d1cc57c179ddbfc5

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 02cb9e015cb4b0f88911b74515c64fbcd311cbe584a4a6692ca40d96eaec9818
MD5 3d5ff0e28efdc7fb3f1c25a86c30bdce
BLAKE2b-256 dd01bde22ef41eaf0c193608d5277b9b46790064daaf4ff69b293af6c205f940

See more details on using hashes here.

File details

Details for the file sequence_align-0.3.0-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sequence_align-0.3.0-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 55f44ffc3924e837ba9db67deac01b864adadfb047e8adc44dbaff2752f00ec0
MD5 71523beeef6163613694bd18cdaa6057
BLAKE2b-256 ef3cbe6e639ef1b4337d293cc82c44aa1d7b948e08e623928130a4b45c024e90

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page