Skip to main content

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms in Rust with Python bindings.

Project description

sequence_align

Efficient implementations of Needleman-Wunsch and other sequence alignment algorithms written in Rust with Python bindings via PyO3.

Installation

sequence_align is distributed via PyPi for Python 3.7+, making installation as simple as the following -- no special setup required for cross-platform compatibility, Rust installation, etc.!

pip install sequence_align

Alternatively, if one wishes to develop for sequence_align, first ensure that both Python and Rust are installed on your system. Then, install Maturin and run maturin develop (optionally with the -r flag to compile a release build, instead of an unoptimized debug build) from the root of your cloned repo to build and install sequence_align in your active Python environment.

Quick Start

Pairwise sequence algorithms are available in sequence_align.pairwise. Currently, two algorithms are implemented: the Needleman-Wunsch algorithm and Hirschberg’s algorithm. Needleman-Wunsch is commonly used for global sequence alignment, but suffers from the fact that it uses O(M*N) space, where M and N are the lengths of the two sequences being aligned. Hirschberg’s algorithm modifies Needleman-Wunsch to have the same time complexity (O(M*N)), but only use O(min{M, N}) space, making it an appealing option for memory-limited applications or extremely large sequences.

One may also compute the Needleman-Wunsch alignment score for alignments produced by either algorithm using sequence_align.pairwise.alignment_score.

Using these algorithms is straightforward:

from sequence_align.pairwise import alignment_score, hirschberg, needleman_wunsch


# See https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm#/media/File:Needleman-Wunsch_pairwise_sequence_alignment.png
# Use Needleman-Wunsch default scores (match=1, mismatch=-1, indel=-1)
seq_a = ["G", "A", "T", "T", "A", "C", "A"]
seq_b = ["G", "C", "A", "T", "G", "C", "G"]

aligned_seq_a, aligned_seq_b = needleman_wunsch(
    seq_a,
    seq_b,
    match_score=1.0,
    mismatch_score=-1.0,
    indel_score=-1.0,
    gap="_",
)

# Expects ["G", "_", "A", "T", "T", "A", "C", "A"]
print(aligned_seq_a)

# Expects ["G", "C", "A", "_", "T", "G", "C", "G"]
print(aligned_seq_b)

# Expects 0
score = alignment_score(
    aligned_seq_a,
    aligned_seq_b,
    match_score=1.0,
    mismatch_score=-1.0,
    indel_score=-1.0,
    gap="_",
)
print(score)


# See https://en.wikipedia.org/wiki/Hirschberg%27s_algorithm#Example
seq_a = ["A", "G", "T", "A", "C", "G", "C", "A"]
seq_b = ["T", "A", "T", "G", "C"]

aligned_seq_a, aligned_seq_b = hirschberg(
    seq_a,
    seq_b,
    match_score=2.0,
    mismatch_score=-1.0,
    indel_score=-2.0,
    gap="_",
)

# Expects ["A", "G", "T", "A", "C", "G", "C", "A"]
print(aligned_seq_a)

# Expects ["_", "_", "T", "A", "T", "G", "C", "_"]
print(aligned_seq_b)

# Expects 1
score = alignment_score(
    aligned_seq_a,
    aligned_seq_b,
    match_score=2.0,
    mismatch_score=-1.0,
    indel_score=-2.0,
    gap="_",
)
print(score)

Performance Benchmarks

All tests below were conducted sequentially on a AWS R5.4 instance with 16 cores and 128 GB of memory. The pair of sequences for alignment consist of a character sequence of randomly selected A/C/G/T nucleotide bases along with another that is identical, except with 10% of the characters randomly perturbed by deletion, insertion of another randomly-selected character after the entry, or replacement with a different randomly-selected character.

As one can see, while sequence_align is comparable to some other toolkits in terms of speed, its memory performance is best-in-class, even when compared to toolkits using the same algorithm, such as Needleman-Wunsch being used in pyseq-align.

(Please note that some lines terminate early, as some toolkits took prohibitively long and/or ran out of memory at higher scales.)

License

Licensed under the Apache 2.0 License. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Copyright 2023-present Kensho Technologies, LLC. The present date is determined by the timestamp of the most recent commit in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

sequence_align-0.2.0-cp37-abi3-win_amd64.whl (111.8 kB view details)

Uploaded CPython 3.7+ Windows x86-64

sequence_align-0.2.0-cp37-abi3-win32.whl (106.2 kB view details)

Uploaded CPython 3.7+ Windows x86

sequence_align-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (220.7 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

sequence_align-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl (287.0 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ s390x

sequence_align-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (240.2 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ppc64le

sequence_align-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (226.1 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARMv7l

sequence_align-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (224.3 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ ARM64

sequence_align-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl (224.3 kB view details)

Uploaded CPython 3.7+ manylinux: glibc 2.5+ i686

sequence_align-0.2.0-cp37-abi3-macosx_11_0_arm64.whl (193.5 kB view details)

Uploaded CPython 3.7+ macOS 11.0+ ARM64

sequence_align-0.2.0-cp37-abi3-macosx_10_12_x86_64.whl (201.1 kB view details)

Uploaded CPython 3.7+ macOS 10.12+ x86-64

File details

Details for the file sequence_align-0.2.0-cp37-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 2039bc90d442d0000271f0153d708a82f1412b763c2a5361032e191612b42473
MD5 d0f32e5c7d4a6b384f64e148b051477b
BLAKE2b-256 269f32b426402369c68206a805870908d9d76c53b2f9450565f42457869cae7c

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-win32.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-win32.whl
Algorithm Hash digest
SHA256 f35b9b377f8c7a2010c86a960de174e9a1135ac1ed42d8f4816ea52c3161aa4a
MD5 6eb35a9cda08b0ed7877e207a7cbc1bd
BLAKE2b-256 8715be9cca6fe0a55a115d3b2c9f42af61e1e43f90c03cec4b26e2d078fcdcf1

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a48c94fef585d204b679dbe9441130426e2aad11e79615848e9c60c1214d43b3
MD5 92343e183e4a018f7401bc19861ec753
BLAKE2b-256 b697dc72f9ce7594f2a0c0056f24de73b1f5faf5db9bed94404174ce8029ddee

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-manylinux_2_17_s390x.manylinux2014_s390x.whl
Algorithm Hash digest
SHA256 08496cafb2422582346c63f7c45cb563104fa0f9b6312d1f3ae60d2583deef00
MD5 7983590e00d11b8145e5d12b2b891592
BLAKE2b-256 61d78eea32e6aab63d49a8623ae38e2834168c256be389331c81d729cff80db7

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl
Algorithm Hash digest
SHA256 38eeca6af12c6b85d40239c717017e201c591e29cf480755420790222a0ccdae
MD5 637bc7a49de6b490fade2958a96991af
BLAKE2b-256 1429e2fef31dd33c3f5c4befacbebe5897c30996439cc4951277839e5175354f

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-manylinux_2_17_armv7l.manylinux2014_armv7l.whl
Algorithm Hash digest
SHA256 a5b499d0ae0f351934fccfd9225cb0523704f65411f72e3400a7381589703baf
MD5 310328b6a97e19cc47bd466411215a62
BLAKE2b-256 a11730431f05201080dec87050ca6232077a44c16327fec46f6ebf9fe7b2a7a6

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cc03e07bb1701d1af6472084f27c6ddbc2114ecfd17aca4fdd987d1249499776
MD5 114ea3d762415fab5d0c7864e9b3702f
BLAKE2b-256 2bed2f6f8cfc97c6bc763aa9a9cc587e0f209be07f4f7e8b24238298d2dd4bd8

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-manylinux_2_5_i686.manylinux1_i686.whl
Algorithm Hash digest
SHA256 329fb118ab662e88580146a7ce76dc854f29f4fd2edbc3883839832be7c1886d
MD5 44a0b5bdbed9ff917db5584fdea647d2
BLAKE2b-256 e07338122ca6555bf8144e5b46170552c7a0295931c4c776a106e3e7fbef9c4c

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3d9f6db93acad4a8468daae182b3e314238739fb9dfa5e9572220595d435c106
MD5 a38ce86840dea8d85f97ccc42f285686
BLAKE2b-256 e0d5dc30712437ce448118bdf66f5cd43cdf6a6eb4119e993feda261c4317db2

See more details on using hashes here.

File details

Details for the file sequence_align-0.2.0-cp37-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sequence_align-0.2.0-cp37-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ac96ba6ef225612663b70bd21f432931ae9fd4fbe718eda2aa489eca3ad13d94
MD5 b775b4dc412090010040e179108a3d75
BLAKE2b-256 715feed984897950aad4ddc2902f7944ce2f918bd94451386a5e71d45cb1a005

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page