Skip to main content

SigAlign's language binding for python

Project description

SigAlign for Python

SigAlign is a sequence alignment algorithm. This repository hosts the Python language bindings for SigAlign. The original project can be found here.

Requirements

  • Python >= 3.7

Installation

  • Via pip

    pip install sigalign
    
  • Manual build: SigAlign relies on maturin as a backend. To build manually:

    pip install maturin
    maturin develop
    

Usage Example

(1) Import SigAlign

from sigalign import Reference, Aligner

(2) Construct Reference

reference = Reference(
    targets = [
        ("target_1", "ACACAGATCGCAAACTCACAATTGTATTTCTTTGCCACCTGGGCATATACTTTTTGCGCCCCCTCATTTA"),
        ("target_2", "TCTGGGGCCATTGTATTTCTTTGCCAGCTGGGGCATATACTTTTTCCGCCCCCTCATTTACGCTCATCAC"),
    ], indexing_option= {
        "sasr": 2,      # Suffix array sampling ratio
        "lts": 10_000,  # Lookup table size (Bytes)
    },
)

Alternatively, build from FASTA file

fasta_file_path = "./YOUR_REFERENCE.fa"
reference = Reference.from_fasta_file(fasta_file_path)

Check status of Reference

print("# Reference Status")
print(f" - Num_targets: {reference.num_targets}")
print(f" - Estimated_total_bytes: {reference.estimated_size / 1024:.2f} KiB")
  • Output:

    # Reference Status
    - Num_targets: 2
    - Estimated_total_bytes: 31.44 KiB
    

(3) Initialize Aligner

aligner = Aligner(
    4,     # Mismatch penalty
    6,     # Gap-open penalty
    2,     # Gap-extend penalty
    50,    # Minimum length
    0.2,   # Maximum penalty per length
)

Check status of Aligner

print("# Aligner Status")
print("  - Penalties")
print(f"    - Mismatch penalty: {aligner.px}")
print(f"    - Gap-open penalty: {aligner.po}")
print(f"    - Gap-extend penalty: {aligner.pe}")
print("  - Similarity Cutoffs")
print(f"    - Minimum length: {aligner.ml}")
print(f"    - Maximum penalty per length: {aligner.mpl:.1f}")
print(f"  - Mode is {'Local' if aligner.is_local_mode else 'Semi-global'}")
  • Output:

    # Aligner Status
    - Penalties
        - Mismatch penalty: 4
        - Gap-open penalty: 6
        - Gap-extend penalty: 2
    - Similarity Cutoffs
        - Minimum length: 50
        - Maximum penalty per length: 0.2
    - Mode is Local
    

(4) Execute Alignment

query = "CAAACTCACAATTGTATTTCTTTGCCAGCTGGGCATATACTTTTTCCGCCCCCTCATTTAACTTCTTGGA"

results = aligner.align_query(reference, query)

Alternatively, use FASTA file

fasta_file_path = "./YOUR_QUERY.fa"
result = aligner.align_fasta_file(
    reference,
    fasta_file_path,
)

(5) Display Results

for target_result in results:
    print(f"# Target index: {target_result.index}")
    for idx, alignment in enumerate(target_result.alignments):
        print(f"  - Result: {idx+1}")
        print(f"    - Penalty: {alignment.penalty}")
        print(f"    - Length: {alignment.length}")
        print(f"    - Query position: {alignment.query_position}")
        print(f"    - Target position: {alignment.target_position}")
  • Output:

    # Target index: 0
    - Result: 1
        - Penalty: 8
        - Length: 60
        - Query position: (0, 60)
        - Target position: (10, 70)
    # Target index: 1
    - Result: 1
        - Penalty: 8
        - Length: 51
        - Query position: (10, 60)
        - Target position: (9, 60)
    

Convert results to json or dict

import json
json.loads(results.to_json())
  • Output:

    [{'index': 0,
    'label': 'target_1',
    'alignments': [{'penalty': 8,
        'length': 60,
        'query_position': [0, 60],
        'target_position': [10, 70],
        'operations': [{'case': 'M', 'count': 27},
        {'case': 'S', 'count': 1},
        {'case': 'M', 'count': 17},
        {'case': 'S', 'count': 1},
        {'case': 'M', 'count': 14}]}]},
    {'index': 1,
    'label': 'target_2',
    'alignments': [{'penalty': 8,
        'length': 51,
        'query_position': [10, 60],
        'target_position': [9, 60],
        'operations': [{'case': 'M', 'count': 23},
        {'case': 'I', 'count': 1},
        {'case': 'M', 'count': 27}]}]}]
    

Convert results to a table

import pandas as pd
df = pd.DataFrame(
    results.to_table(),
    columns = [
        'index', 'label', 'penalty', 'length',
        'qstart', 'qend', 'tstart', 'tend', 'operations',
    ]
)
df
  • Output:

    index label penalty length qstart qend tstart tend operations
    0 target_1 8 60 0 60 10 70 M27S1M17S1M14
    1 target_2 8 51 10 60 9 60 M23I1M27

Additional Information

This python library binds the default wrappers of the Aligner and Reference structs from the original library. It provides a set of functions sufficient for most common tasks. However, if you wish to customize or optimize the Aligner and Reference, you can define your own versions and use them with Foreign Function Interface (FFI) in Python.

Support

For any questions or issues, please refer to the original project's GitHub issue tracker.

License

SigAlign for Python is released under the MIT license. For more details, see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sigalign-0.2.0.tar.gz (66.0 kB view hashes)

Uploaded Source

Built Distributions

sigalign-0.2.0-cp37-abi3-win_amd64.whl (287.7 kB view hashes)

Uploaded CPython 3.7+ Windows x86-64

sigalign-0.2.0-cp37-abi3-manylinux_2_28_x86_64.whl (737.2 kB view hashes)

Uploaded CPython 3.7+ manylinux: glibc 2.28+ x86-64

sigalign-0.2.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (748.0 kB view hashes)

Uploaded CPython 3.7+ manylinux: glibc 2.17+ x86-64

sigalign-0.2.0-cp37-abi3-macosx_10_7_x86_64.whl (428.9 kB view hashes)

Uploaded CPython 3.7+ macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page