SigAlign's language binding for python
Project description
SigAlign for Python
SigAlign is a sequence alignment algorithm. This repository hosts the Python language bindings for SigAlign. The original project can be found here.
Requirements
- Python >= 3.7
Installation
-
Via
pip
pip install sigalign
-
Manual build: SigAlign relies on
maturin
as a backend. To build manually:pip install maturin maturin develop
Usage Example
(1) Import SigAlign
from sigalign import Reference, Aligner
(2) Construct Reference
reference = Reference(
targets = [
("target_1", "ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA"),
("target_2", "TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC"),
], indexing_option= {
"sasr": 2, # Suffix array sampling ratio
"lts": 10_000, # Lookup table size (Bytes)
},
)
Alternatively, build from FASTA file
fasta_file_path = "./YOUR_REFERENCE.fa"
reference = Reference.from_fasta_file(fasta_file_path)
Check status of Reference
print("# Reference Status")
print(f" - Num_targets: {reference.num_targets}")
print(f" - Estimated_total_bytes: {reference.estimated_size / 1024:.2f} KiB")
-
Output:
# Reference Status - Num_targets: 2 - Estimated_total_bytes: 31.44 KiB
(3) Initialize Aligner
aligner = Aligner(
4, # Mismatch penalty
6, # Gap-open penalty
2, # Gap-extend penalty
50, # Minimum length
0.2, # Maximum penalty per length
)
Check status of Aligner
print("# Aligner Status")
print(" - Penalties")
print(f" - Mismatch penalty: {aligner.px}")
print(f" - Gap-open penalty: {aligner.po}")
print(f" - Gap-extend penalty: {aligner.pe}")
print(" - Similarity Cutoffs")
print(f" - Minimum length: {aligner.ml}")
print(f" - Maximum penalty per length: {aligner.mpl:.1f}")
print(f" - Mode is {'Local' if aligner.is_local_mode else 'Semi-global'}")
-
Output:
# Aligner Status - Penalties - Mismatch penalty: 4 - Gap-open penalty: 6 - Gap-extend penalty: 2 - Similarity Cutoffs - Minimum length: 50 - Maximum penalty per length: 0.2 - Mode is Local
(4) Execute Alignment
query = "CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA"
results = aligner.align_query(reference, query)
Alternatively, use FASTA file
fasta_file_path = "./YOUR_QUERY.fa"
result = aligner.align_fasta_file(
reference,
fasta_file_path,
)
(5) Display Results
for target_result in results:
print(f"# Target index: {target_result.index}")
for idx, alignment in enumerate(target_result.alignments):
print(f" - Result: {idx+1}")
print(f" - Penalty: {alignment.penalty}")
print(f" - Length: {alignment.length}")
print(f" - Query position: {alignment.query_position}")
print(f" - Target position: {alignment.target_position}")
-
Output:
# Target index: 0 - Result: 1 - Penalty: 8 - Length: 60 - Query position: (0, 60) - Target position: (10, 70) # Target index: 1 - Result: 1 - Penalty: 8 - Length: 51 - Query position: (10, 60) - Target position: (9, 60)
Convert results to json or dict
import json
json.loads(results.to_json())
-
Output:
[{'index': 0, 'label': 'target_1', 'alignments': [{'penalty': 8, 'length': 60, 'query_position': [0, 60], 'target_position': [10, 70], 'operations': [{'case': 'M', 'count': 27}, {'case': 'S', 'count': 1}, {'case': 'M', 'count': 17}, {'case': 'S', 'count': 1}, {'case': 'M', 'count': 14}]}]}, {'index': 1, 'label': 'target_2', 'alignments': [{'penalty': 8, 'length': 51, 'query_position': [10, 60], 'target_position': [9, 60], 'operations': [{'case': 'M', 'count': 23}, {'case': 'I', 'count': 1}, {'case': 'M', 'count': 27}]}]}]
Convert results to a table
import pandas as pd
df = pd.DataFrame(
results.to_table(),
columns = [
'index', 'label', 'penalty', 'length',
'qstart', 'qend', 'tstart', 'tend', 'operations',
]
)
df
-
Output:
index label penalty length qstart qend tstart tend operations 0 target_1 8 60 0 60 10 70 M27S1M17S1M14 1 target_2 8 51 10 60 9 60 M23I1M27
Additional Information
This python library binds the default wrappers of the Aligner
and Reference
structs from the original library. It provides a set of functions sufficient for most common tasks. However, if you wish to customize or optimize the Aligner
and Reference
, you can define your own versions and use them with Foreign Function Interface (FFI) in Python.
Support
For any questions or issues, please refer to the original project's GitHub issue tracker.
License
SigAlign for Python is released under the MIT license. For more details, see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for sigalign-0.2.0-cp37-abi3-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01c38c0d6431b7935dbd40aebeaf6316fba78c9391965be8570b077009a53d36 |
|
MD5 | ae4640aba4fe08143653779ac5bd7b8f |
|
BLAKE2b-256 | dd886263a22a1d9ae0a2e3018e7e2f31a7ccdc5f089ee53d380825e417f86334 |
Hashes for sigalign-0.2.0-cp37-abi3-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a7b99ea62dd21a4b2d8980c275bd7aaeceaae588fc537387c201eefebaff408 |
|
MD5 | f5faf9be1da6903c6a84f9a6a69ca6db |
|
BLAKE2b-256 | 5eb31a42754c5f04357431f24f7c0ab38202dce9c457a3639f3e4cef5e69b60e |
Hashes for sigalign-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2451765efc3c1f7bbd9b1cd65f1acf59324922159f6e295536af523592003c2a |
|
MD5 | a9008466b54454a468678e07c589e434 |
|
BLAKE2b-256 | ef701d9e88411052575db95b385d3d7d781c8a427d43e55cf01153767fccfc12 |
Hashes for sigalign-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3967eb63f8cefe50d1382911d5db213598f1847c1c965394af903379ecfe9bc6 |
|
MD5 | 7ca44d5218aa9eaefd308fbb8a7c46cb |
|
BLAKE2b-256 | 71feb569f50d3ddcb2ed080bdf4c270bdad1bd6b7ad80a1d23209e90a3b7c99f |