Skip to main content

Library for multiple asymmetric alignments on different alphabets

Project description

PyPI Build Status codecov Codacy Badge

MALIGN is a library for performing multiple alignments on sequences of different alphabets. It allows each sequence to have its own domain, which in turns allows to use asymmetric and sparse scoring matrices, including on gaps, and to perform real, single-pass multiple alignment, allowing to compute k-best alignments. While intended for linguistic usage mostly, it can be used for aligning any type of sequential representation, and it is particularly suitable as a general-purpose tool for cases where there are no prior hypotheses on the scoring matrices.

Installation and usage

The library can be installed as any standard Python library with pip, and used as demonstrated in the following snippet:

In any standard Python environment, malign can be installed with:

$ pip install malign

For most purposes, it is enough to pass the sequences to be aligned and a method (such as anw or yenksp) to the .multi_align() function:

>>> import malign
>>> alms = malign.multi_align(["ATTCGGAT", "TACGGATTT"], "anw", k=2)
>>> print(malign.tabulate_alms(alms))
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |  #8  |  #9  |
|-------|-------|---------|------|------|------|------|------|------|------|------|------|------|
| 0     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  T   |  -   |
| 0     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |
|       |       |         |      |      |      |      |      |      |      |      |      |      |
| 1     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  -   |  T   |
| 1     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |

Scoring matrices can be either computed with the auxiliary methods, including various optimizations, or read from JSON files:

>>> ita_rus = malign.ScoringMatrix(filename="docs/ita_rus.matrix")
>>> alms = malign.multi_align(["Giacomo", "Яков"], k=4, method="anw", matrix=ita_rus)
>>> print(malign.tabulate_alms(alms))
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0     | A     |    2.86 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 0     | B     |    2.86 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 1     | A     |    2.29 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 1     | B     |    2.29 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 2     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 2     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  -   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 3     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 3     | B     |    2.12 |  -   |  Я   |  -   |  к   |  -   |  -   |  о   |  в   |

The library can also be used by means of the command-line malign tool. If no matrix is provided, an identity one is used by default.

$  malign baba,maa
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |
|-------|-------|---------|------|------|------|------|
| 0     | A     |   -0.47 |  b   |  a   |  b   |  a   |
| 0     | B     |   -0.47 |  m   |  a   |  -   |  a   |

$  malign --matrix docs/ita_rus.matrix -k 6 Giacomo,Яков
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0     | A     |    2.86 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 0     | B     |    2.86 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 1     | A     |    2.29 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 1     | B     |    2.29 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 2     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 2     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  -   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 3     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 3     | B     |    2.12 |  -   |  Я   |  -   |  к   |  -   |  -   |  о   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 4     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  -   |  o   |
| 4     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |  -   |
|       |       |         |      |      |      |      |      |      |      |      |
| 5     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  -   |  m   |  o   |
| 5     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |  -   |

Changelog

Version 0.1: - First release for internal announcement, testing, and community outreach

Version 0.2: - Major revision with asymmetric Needleman-Wunsch and Yen’s k-shortest path implementation. - Added scoring matrix object - Sort alignments in consistent and reproducible ways, even when the alignment score is the same

Roadmap

Version 0.3: - Complete documentation and setup readthedocs - Add new inference method to sparse matrices using impurity/entropy - Describe matrix filling methods in more detail - Consider implementation of UPGMA and NJ multiple alignment - Add function/method to visualize the graphs used for the yenksp methods - Implement blocks and local search in anw and yenksp, with different starting/ending positions - Implement memoization where possible - Consider expanding dumb_malign by adding random gaps (pad_align), as an additional baseline method - Allow anw to work within a threshold percentage of the best score - Implement a method combining the results of the different algorithms - Add methods and demonstration for matrix optimization

Community guidelines

While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.

Contributing guidelines, including a code of conduct, can be found in the CONTRIBUTING.md file.

Author and citation

The library is developed by Tiago Tresoldi (tresoldi@shh.mpg.de).

The author has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison.

If you use malign, please cite it as:

Tresoldi, Tiago (2020). MALIGN, a library for multiple asymmetric alignments on different alphabets. Version 0.2. Jena.

In BibTeX:

@misc{Tresoldi2020malign,
  author = {Tresoldi, Tiago},
  title = {MALIGN, a library for multiple asymmetric alignments on different alphabets. Version 0.2},
  howpublished = {\url{https://github.com/tresoldi/malign}},
  address = {Jena},
  publisher = {Max Planck Institute for the Science of Human History}
  year = {2020},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malign-0.2.tar.gz (4.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

malign-0.2-py3-none-any.whl (4.8 MB view details)

Uploaded Python 3

File details

Details for the file malign-0.2.tar.gz.

File metadata

  • Download URL: malign-0.2.tar.gz
  • Upload date:
  • Size: 4.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for malign-0.2.tar.gz
Algorithm Hash digest
SHA256 8a144ee758a6f9598209d086feb0108fe479ac8c0496b3cda0cbb4202057b80b
MD5 da63cf0666761f291375f7a2b06e456f
BLAKE2b-256 a68934f15c445994674330db1ae07989b0d9ce08aefc558cb7032d7db670ae28

See more details on using hashes here.

File details

Details for the file malign-0.2-py3-none-any.whl.

File metadata

  • Download URL: malign-0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for malign-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f5f0057d73771dcf728e958c0c204569f70c63cbdd99fc413b0a782ed4ba1492
MD5 6939bd1755ba94ba7c395a964bbd9e04
BLAKE2b-256 adac2f9651f67cc295136822cc44a6c588c848d4cd02d8d9fcd567729d971691

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page