Skip to main content

Library for multiple asymmetric alignments on different alphabets

Project description

MAlign

PyPI Python package Codacy Badge

MALIGN is a library for performing multiple alignments on sequences of different domains, allowing the usage of asymmetric scoring matrices. Multiple alignments are actual multiple alignments, scoring according to the overall probability of each alignment site, and not a succession of pairwise alignments gradually combined.

While intended for linguistic usage mostly, it can be used for aligning any type of sequential representation as long as the elements of each domain are hashable. It is particularly suitable as a general-purpose tool for cases where there are no prior hypotheses on the scoring matrices, which can be inferred or imputed (including from incomplete data), or optimized from observable examples to find local and global minima that can be used to explain the relationships between the sequences.

Installation and usage

The library can be installed as any standard Python library with pip, preferably within a virtual environment:

$ pip install malign

For most purposes, it is enough to pass the sequences to be aligned and specify one of the available methods (currently anw, the default, and yenksp) to the .multi_align() function, along with the maximum number of alignments to be returned (k):

>> import malign                                                                                                      
>> alms = malign.multi_align(["ATTCGGAT", "TACGGATTT"], k=2)                                                   
>> print(malign.tabulate_alms(alms))                                                                                  
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |  #8  |  #9  |
|-------|-------|---------|------|------|------|------|------|------|------|------|------|------|
| 0     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  T   |  -   |
| 0     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |
|       |       |         |      |      |      |      |      |      |      |      |      |      |
| 1     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  -   |  T   |
| 1     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |

Scoring matrices can be either computed with the auxiliary methods, including various optimizations, or read from JSON files:

>> ita_rus = malign.ScoringMatrix()
>> ita_rus.load("docs/ita_rus.matrix")
>> alms = malign.multi_align(["Giacomo", "Яков"], k=4, method="anw", matrix=ita_rus)
>> print(malign.tabulate_alms(alms))
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0     | A     |    2.86 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 0     | B     |    2.86 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 1     | A     |    2.29 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 1     | B     |    2.29 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 2     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 2     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  -   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 3     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 3     | B     |    2.12 |  -   |  Я   |  -   |  к   |  -   |  -   |  о   |  в   |

More complex examples, including for matrix imputation and optimization, can be found in the documentation.

Changelog

Version 0.1:

  • First release for an internal announcement, testing, and community outreach

Version 0.2:

  • Major revision with asymmetric Needleman-Wunsch and Yen's k-shortest path implementation
  • Added scoring matrix object
  • Sort alignments in consistent and reproducible ways, even when the alignment score is the same

Version 0.3

  • Code improvements, including type annotation, and some refactoring
  • Allowing usage with any hashable Python object (not only strings)
  • Add methods for matrix imputation
  • Update of documentation
  • General preparations for public announcement

TODO

  • Complete documentation and setup readthedocs
  • Consider implementation of UPGMA and NJ multiple alignment
  • Add function/method to visualize the graphs used for the yenksp methods
  • Implement blocks and local search in anw and yenksp, with different starting/ending positions
  • Implement memoization where possible
  • Consider expanding dumb_malign by adding random gaps (pad_align), as an additional baseline method
  • Allow anw to work within a threshold percentage of the best score
  • Implement a method combining the results of the different algorithms
  • Add methods and demonstration for matrix optimization
  • Move to GitHub Actions

Community guidelines

While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.

Contributing guidelines, including a code of conduct, can be found in the CONTRIBUTING.md file.

Author and citation

The library is developed by Tiago Tresoldi (tiago.tresoldi@lingfil.uu.se).

The author has received funding from the Riksbankens Jubileumsfond (grant agreement ID: MXM19-1087:1, Cultural Evolution of Texts).

During the first stages of development, the author received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison).

If you use malign, please cite it as:

Tresoldi, Tiago (2021). MALIGN, a library for multiple asymmetric alignments on different domains. Version 0.3. Uppsala: Uppsala Universitet.

In BibTeX:

@misc{Tresoldi2021malign,
  author = {Tresoldi, Tiago},
  title = {MALIGN, a library for multiple asymmetric alignments on different domains. Version 0.3},
  howpublished = {\url{https://github.com/tresoldi/malign}},
  address = {Uppsala},
  publisher = {Uppsala Universitet}
  year = {2021},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malign-0.3.2.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

malign-0.3.2-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file malign-0.3.2.tar.gz.

File metadata

  • Download URL: malign-0.3.2.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6

File hashes

Hashes for malign-0.3.2.tar.gz
Algorithm Hash digest
SHA256 b785e6d6feebc9851fa05d2162e195c4bea424db51a296d825daac33174277c3
MD5 4f4ae60bbf908c620e57c45755b35669
BLAKE2b-256 a2cd48f46b2a98a19203c1c35889660e308fae414c92cb3472f096c5201511d2

See more details on using hashes here.

File details

Details for the file malign-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: malign-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6

File hashes

Hashes for malign-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8a6d3190d4a60fd46de6274864226ae85f1ab3e9df22cf376fe40da2ed9a6765
MD5 f5f8369ddfa53f8820cf5d63e2ac0c9c
BLAKE2b-256 8c89d1d32ec08c6f4bc12ab0e5d68e387a96cfab41b5c3a0a8f5430ee9328a2a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page