SigAlign's language binding for python
Project description
SigAlign
for Python
SigAlign is a sequence alignment algorithm. This repository hosts the Python language bindings for SigAlign. The original project can be found here.
Requirements
- Python >= 3.10
Installation
-
Via
pip
pip install sigalign
-
Manual build: SigAlign relies on
maturin
as a backend. To build manually:pip install maturin maturin develop
Usage Example
(1) Import SigAlign
from sigalign import Reference, Aligner
(2) Construct Reference
# Build Reference object from `iterable` of tuples (label, sequence).
reference = Reference.from_iterable([
("target_1", "ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA"),
("target_2", "TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC"),
])
# Or only sequences
reference = Reference.from_iterable([
"ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA",
"TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC",
])
# Bytes can be used instead of strings
reference = Reference.from_iterable([
b"ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA",
b"TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC",
])
# FASTA format can be used
reference = Reference.from_fasta(b""">target_1
ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA
>target_2
TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC""")
# Or from a file
# reference = Reference.from_fasta_file("reference.fasta")
Check status of Reference
print("# Reference Status")
print(f" - Num targets: {reference.num_targets}")
print(f" - Total length: {reference.total_length} bps")
print(f" - Estimated size: {reference.estimated_size / 1024:.2f} KiB")
-
Output:
# Reference Status - Num targets: 2 - Total length: 140 bps - Estimated size: 1.32 KiB
Parse target label and sequence
for target_index in range(reference.num_targets):
print(f"# Target {target_index}")
print(f" - Label: {reference.get_label(target_index)}")
print(f" - Sequence: {reference.get_sequence(target_index)}")
-
Output:
# Target 0 - Label: target_1 - Sequence: ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA # Target 1 - Label: target_2 - Sequence: TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC
Save and load
# Save
reference.save_to_file("reference.sigref")
# Load
reference = Reference.load_from_file("reference.sigref")
(3) Initialize Aligner
aligner = Aligner(
4, # Mismatch penalty
6, # Gap-open penalty
2, # Gap-extend penalty
50, # Minimum length
0.2, # Maximum penalty per length
use_local_mode=True, # Use local alignment (default: True)
use_limit=None, # Limit the number of alignments (default: None)
use_chunk=None, # Align with chunked query with (chunk size, sliding window size) (default: None)
)
Check status of Aligner
print("# Aligner Status")
print(" - Penalties")
print(f" - Mismatch penalty: {aligner.px}")
print(f" - Gap-open penalty: {aligner.po}")
print(f" - Gap-extend penalty: {aligner.pe}")
print(" - Similarity Cutoffs")
print(f" - Minimum length: {aligner.minl}")
print(f" - Maximum penalty per length: {aligner.maxp:.2f}")
print(f" - Mode is {'Local' if aligner.is_local_mode else 'Semi-global'}")
print(f" - Max alignments: {'Infinity' if aligner.limitation is None else aligner.limitation}")
print(f" - Chunk: {aligner.chunk}")
-
Output:
# Aligner Status - Penalties - Mismatch penalty: 4 - Gap-open penalty: 6 - Gap-extend penalty: 2 - Similarity Cutoffs - Minimum length: 50 - Maximum penalty per length: 0.20 - Mode is Local - Max alignments: Infinity - Chunk: None
(4) Perform Alignment
# Align a query str to the reference
query = "CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA"
results = aligner.align_query(query, reference)
# Or query bytes can be used
query = b"CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA"
results = aligner.align_query(query, reference, with_label=True) # including label is slightly slower than without label (default: False)
# FASTA (str or bytes) can be used
fasta = b""">query_1
CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA"""
results = aligner.align_fasta(
fasta,
reference,
)
# Or file can be used:
# results = aligner.align_fasta_file(
# "path/to/file.fasta",
# reference,
# )
# FASTQ (str or bytes) can be used:
fastq = b"""@query_1
CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"""
results = aligner.align_fastq(
fastq,
reference,
with_label=True, # include label in the result (default: False)
with_reverse_complementary=False, # align both forward and reverse complementary (default: False)
allow_interrupt=True, # allow interrupting with KeyboardInterrupt (default: False)
)
# Or file can be used:
# results = aligner.align_fastq_file(
# "path/to/file.fastq",
# reference,
# )
(5) Display Results
for read_alignment in results:
print(f"# Query: {read_alignment.read} (is forward: {read_alignment.is_forward})")
for target_alignment in read_alignment.result:
print(f" - Target: {target_alignment.label} (index {target_alignment.index})")
for idx, alignment in enumerate(target_alignment.alignments):
print(f" - Result {idx+1}")
print(f" * Penalty: {alignment.penalty}")
print(f" * Length: {alignment.length}")
print(f" * Query position: {alignment.query_position}")
print(f" * Target position: {alignment.target_position}")
-
Output:
# Query: query_1 (is forward: True) - Target: target_1 (index 0) - Result 1 - Penalty: 8 - Length: 60 - Query position: (0, 60) - Target position: (10, 70) - Target: target_2 (index 1) - Result 1 - Penalty: 8 - Length: 51 - Query position: (10, 60) - Target position: (9, 60)
Convert results to json or dict
import json
json.loads(results.to_json())
-
Output:
[{'read': 'query_1', 'is_forward': True, 'result': [{'index': 0, 'label': 'target_1', 'alignments': [{'penalty': 8, 'length': 60, 'query_position': [0, 60], 'target_position': [10, 70], 'operations': [{'operation': 'Match', 'count': 27}, {'operation': 'Subst', 'count': 1}, {'operation': 'Match', 'count': 17}, {'operation': 'Subst', 'count': 1}, {'operation': 'Match', 'count': 14}]}]}, {'index': 1, 'label': 'target_2', 'alignments': [{'penalty': 8, 'length': 51, 'query_position': [10, 60], 'target_position': [9, 60], 'operations': [{'operation': 'Match', 'count': 23}, {'operation': 'Deletion', 'count': 1}, {'operation': 'Match', 'count': 27}]}]}]}]
Convert results to a table
import pandas as pd
df = pd.DataFrame(
results.to_rows(),
columns = [
'query_label', 'is_forward',
'target_index', 'target_label', 'penalty', 'length',
'query_start', 'query_end', 'target_start', 'target_end', 'operations',
],
)
df
-
Output:
query_label is_forward target_index target_label penalty length query_start query_end target_start target_end CIGAR 0 query_1 TRUE 1 target_2 8 51 10 60 9 60 23=1D27= 1 query_1 TRUE 0 target_1 8 60 0 60 10 70 27=1X17=1X14=
import polars as pl
df = pl.DataFrame(
results.to_rows(),
orient="row",
schema=[
'query_label', 'is_forward',
'target_index', 'target_label', 'penalty', 'length',
'query_start', 'query_end', 'target_start', 'target_end', 'operations',
],
)
df
-
Output:
query_label is_forward target_index target_label penalty length query_start query_end target_start target_end CIGAR str bool i64 str i64 i64 i64 i64 i64 i64 str "query_1" true 1 "target_2" 8 51 10 60 9 60 "23=1D27=" "query_1" true 0 "target_1" 8 60 0 60 10 70 "27=1X17=1X14="
Additional Information
This Python library provides bindings for the Rust crate sigalign
. It offers a set of functions sufficient for most common tasks. However, for more customization, using the Rust crate directly is recommended.
Support
For any questions or issues, please refer to the original project's GitHub issue tracker.
License
SigAlign for Python is released under the MIT license.
Citation
Bahk, K., & Sung, J. (2024). SigAlign: an alignment algorithm guided by explicit similarity criteria. Nucleic Acids Research, gkae607.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file sigalign-0.3.1.tar.gz
.
File metadata
- Download URL: sigalign-0.3.1.tar.gz
- Upload date:
- Size: 77.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67352d83190c3ba080f7d294849425bcd76a4fd7506f9c75e656e09e51135edb |
|
MD5 | 1b98730bdc6a9a349c5ec617cefc5188 |
|
BLAKE2b-256 | ad311a08aa641fd3e63f308932b1358fdde22679f4646e35832581410f6a488c |
File details
Details for the file sigalign-0.3.1-cp310-abi3-win_amd64.whl
.
File metadata
- Download URL: sigalign-0.3.1-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 392.1 kB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72da8798a840a8618a5426f383ca9ec65ba7be37e9bcc9b68ea2153efd5cdf4d |
|
MD5 | 7a080c24dc6e0de56f1f51cc5f74373c |
|
BLAKE2b-256 | d06412fa2e15e1dc16e8b91e1aec5a836466895e80148fd8a4f357b46b59c3af |
File details
Details for the file sigalign-0.3.1-cp310-abi3-manylinux_2_28_x86_64.whl
.
File metadata
- Download URL: sigalign-0.3.1-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 12.2 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bae2127d93510ab73eda82210d8d0c41b44891b49c6abb588d8af8ce8dc35df |
|
MD5 | 6a066d6cb9ca622338448cfec321f019 |
|
BLAKE2b-256 | c5b0fd393dd00ac2ec3fbd9860bd3c02e4b63b41857203001b801c2aa59e0b48 |
File details
Details for the file sigalign-0.3.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: sigalign-0.3.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 454.6 kB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 211ab0edf1b48dd0bbc334310843c6b66fe84b194d78de10f9c22ed940a70f3a |
|
MD5 | be4a1fe5dac5f0251d360146cd81fd08 |
|
BLAKE2b-256 | 41d29271685a401cb454bdf3fc39ef25c30bfc0d1fdf930094e5e27ff179c24d |
File details
Details for the file sigalign-0.3.1-cp310-abi3-macosx_10_12_x86_64.whl
.
File metadata
- Download URL: sigalign-0.3.1-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 380.3 kB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d96113f2a8bc1a436818a6e72a4009a599c10fd0796a6db75c3dfdfbc26a604 |
|
MD5 | b3b9003b755ff154f045e627465c4f91 |
|
BLAKE2b-256 | d27de1cff4ab4efeefcf630cbdcdc40642e149555c3ae219787d3c83c87328c4 |