Skip to main content

Library for multiple asymmetric alignments on different alphabets

Project description

MAlign

PyPI Python package Codacy Badge

MALIGN is a library for performing multiple alignments on sequences of different domains, allowing the usage of asymmetric scoring matrices. Multiple alignments are actual multiple alignments, scoring according to the overall probability of each alignment site, and not a succession of pairwise alignments gradually combined.

While intended for linguistic usage mostly, it can be used for aligning any type of sequential representation as long as the elements of each domain are hashable. It is particularly suitable as a general-purpose tool for cases where there are no prior hypotheses on the scoring matrices, which can be inferred or imputed (including from incomplete data), or optimized from observable examples to find local and global minima that can be used to explain the relationships between the sequences.

Installation and usage

The library can be installed as any standard Python library with pip, preferably within a virtual environment:

$ pip install malign

For most purposes, it is enough to pass the sequences to be aligned and specify one of the available methods (currently anw, the default, and yenksp) to the .multi_align() function, along with the maximum number of alignments to be returned (k):

>> import malign                                                                                                      
>> alms = malign.multi_align(["ATTCGGAT", "TACGGATTT"], k=2)                                                   
>> print(malign.tabulate_alms(alms))                                                                                  
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |  #8  |  #9  |
|-------|-------|---------|------|------|------|------|------|------|------|------|------|------|
| 0     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  T   |  -   |
| 0     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |
|       |       |         |      |      |      |      |      |      |      |      |      |      |
| 1     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  -   |  T   |
| 1     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |

Scoring matrices can be either computed with the auxiliary methods, including various optimizations, or read from JSON files:

>> ita_rus = malign.ScoringMatrix()
>> ita_rus.load("docs/ita_rus.matrix")
>> alms = malign.multi_align(["Giacomo", "Яков"], k=4, method="anw", matrix=ita_rus)
>> print(malign.tabulate_alms(alms))
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0     | A     |    2.86 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 0     | B     |    2.86 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 1     | A     |    2.29 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 1     | B     |    2.29 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 2     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 2     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  -   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 3     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 3     | B     |    2.12 |  -   |  Я   |  -   |  к   |  -   |  -   |  о   |  в   |

More complex examples, including for matrix imputation and optimization, can be found in the documentation.

Changelog

Version 0.1:

  • First release for an internal announcement, testing, and community outreach

Version 0.2:

  • Major revision with asymmetric Needleman-Wunsch and Yen's k-shortest path implementation
  • Added scoring matrix object
  • Sort alignments in consistent and reproducible ways, even when the alignment score is the same

Version 0.3

  • Code improvements, including type annotation, and some refactoring
  • Allowing usage with any hashable Python object (not only strings)
  • Add methods for matrix imputation
  • Update of documentation
  • General preparations for public announcement

TODO

  • Complete documentation and setup readthedocs
  • Consider implementation of UPGMA and NJ multiple alignment
  • Add function/method to visualize the graphs used for the yenksp methods
  • Implement blocks and local search in anw and yenksp, with different starting/ending positions
  • Implement memoization where possible
  • Consider expanding dumb_malign by adding random gaps (pad_align), as an additional baseline method
  • Allow anw to work within a threshold percentage of the best score
  • Implement a method combining the results of the different algorithms
  • Add methods and demonstration for matrix optimization
  • Move to GitHub Actions

Community guidelines

While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.

Contributing guidelines, including a code of conduct, can be found in the CONTRIBUTING.md file.

Author and citation

The library is developed by Tiago Tresoldi (tiago.tresoldi@lingfil.uu.se).

The author has received funding from the Riksbankens Jubileumsfond (grant agreement ID: MXM19-1087:1, Cultural Evolution of Texts).

During the first stages of development, the author received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison.

If you use malign, please cite it as:

Tresoldi, Tiago (2021). MALIGN, a library for multiple asymmetric alignments on different domains. Version 0.3. Uppsala: Uppsala Universitet.

In BibTeX:

@misc{Tresoldi2021malign,
  author = {Tresoldi, Tiago},
  title = {MALIGN, a library for multiple asymmetric alignments on different domains. Version 0.3},
  howpublished = {\url{https://github.com/tresoldi/malign}},
  address = {Uppsala},
  publisher = {Uppsala Universitet}
  year = {2021},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malign-0.3.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malign-0.3-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file malign-0.3.tar.gz.

File metadata

  • Download URL: malign-0.3.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6

File hashes

Hashes for malign-0.3.tar.gz
Algorithm Hash digest
SHA256 11181592ac4692deb561eefd39a0a5b28ad6aa652292519b9a3773660c952600
MD5 d75726269b602edb778d07795b51c078
BLAKE2b-256 71bb98a85aa2df18921ba9dd81c18998f95e301c0aeb6cf5ff1d81366dd57a27

See more details on using hashes here.

File details

Details for the file malign-0.3-py3-none-any.whl.

File metadata

  • Download URL: malign-0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6

File hashes

Hashes for malign-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 386d5a8bf43d6956dc7af2bfc10612f41f27077700ac27a5ccefdbabb00d53e8
MD5 efe09e96de21424c502944bc050b0760
BLAKE2b-256 ab17bca0530dbddb3945c40122c0cb18b87c99c3b9dd479f7d2c5d415109c765

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page