malign

Library for multiple asymmetric alignments on different alphabets

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries

Project description

MALIGN is a library for performing multiple alignments on sequences of different alphabets. It allows each sequence to have its own domain, which in turns allows to use asymmetric and sparse scoring matrices, including on gaps, and to perform real, single-pass multiple alignment, allowing to compute k-best alignments. While intended for linguistic usage mostly, it can be used for aligning any type of sequential representation, and it is particularly suitable as a general-purpose tool for cases where there are no prior hypotheses on the scoring matrices.

Installation and usage

The library can be installed as any standard Python library with pip, and used as demonstrated in the following snippet:

In any standard Python environment, malign can be installed with:

$ pip install malign

For most purposes, it is enough to pass the sequences to be aligned and a method (such as anw or yenksp) to the .multi_align() function:

>>> import malign
>>> alms = malign.multi_align(["ATTCGGAT", "TACGGATTT"], "anw", k=2)
>>> print(malign.tabulate_alms(alms))
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |  #8  |  #9  |
|-------|-------|---------|------|------|------|------|------|------|------|------|------|------|
| 0     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  T   |  -   |
| 0     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |
|       |       |         |      |      |      |      |      |      |      |      |      |      |
| 1     | A     |   -0.29 |  A   |  T   |  T   |  C   |  G   |  G   |  A   |  -   |  -   |  T   |
| 1     | B     |   -0.29 |  -   |  T   |  A   |  C   |  G   |  G   |  A   |  T   |  T   |  T   |

Scoring matrices can be either computed with the auxiliary methods, including various optimizations, or read from JSON files:

>>> ita_rus = malign.ScoringMatrix(filename="docs/ita_rus.matrix")
>>> alms = malign.multi_align(["Giacomo", "Яков"], k=4, method="anw", matrix=ita_rus)
>>> print(malign.tabulate_alms(alms))
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0     | A     |    2.86 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 0     | B     |    2.86 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 1     | A     |    2.29 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 1     | B     |    2.29 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 2     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 2     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  -   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 3     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 3     | B     |    2.12 |  -   |  Я   |  -   |  к   |  -   |  -   |  о   |  в   |

The library can also be used by means of the command-line malign tool. If no matrix is provided, an identity one is used by default.

$ ▶ malign baba,maa
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |
|-------|-------|---------|------|------|------|------|
| 0     | A     |   -0.47 |  b   |  a   |  b   |  a   |
| 0     | B     |   -0.47 |  m   |  a   |  -   |  a   |

$ ▶ malign --matrix docs/ita_rus.matrix -k 6 Giacomo,Яков
| Idx   | Seq   |   Score |  #0  |  #1  |  #2  |  #3  |  #4  |  #5  |  #6  |  #7  |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0     | A     |    2.86 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 0     | B     |    2.86 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 1     | A     |    2.29 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |      |
| 1     | B     |    2.29 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |      |
|       |       |         |      |      |      |      |      |      |      |      |
| 2     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 2     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  -   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 3     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  o   |  -   |
| 3     | B     |    2.12 |  -   |  Я   |  -   |  к   |  -   |  -   |  о   |  в   |
|       |       |         |      |      |      |      |      |      |      |      |
| 4     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  m   |  -   |  o   |
| 4     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  -   |  в   |  -   |
|       |       |         |      |      |      |      |      |      |      |      |
| 5     | A     |    2.12 |  G   |  i   |  a   |  c   |  o   |  -   |  m   |  o   |
| 5     | B     |    2.12 |  -   |  Я   |  -   |  к   |  о   |  в   |  -   |  -   |

Changelog

Version 0.1: - First release for internal announcement, testing, and community outreach

Version 0.2: - Major revision with asymmetric Needleman-Wunsch and Yen’s k-shortest path implementation. - Added scoring matrix object - Sort alignments in consistent and reproducible ways, even when the alignment score is the same

Roadmap

Version 0.3: - Complete documentation and setup readthedocs - Add new inference method to sparse matrices using impurity/entropy - Describe matrix filling methods in more detail - Consider implementation of UPGMA and NJ multiple alignment - Add function/method to visualize the graphs used for the yenksp methods - Implement blocks and local search in anw and yenksp, with different starting/ending positions - Implement memoization where possible - Consider expanding dumb_malign by adding random gaps (pad_align), as an additional baseline method - Allow anw to work within a threshold percentage of the best score - Implement a method combining the results of the different algorithms - Add methods and demonstration for matrix optimization

Community guidelines

While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.

Contributing guidelines, including a code of conduct, can be found in the CONTRIBUTING.md file.

Author and citation

The library is developed by Tiago Tresoldi (tresoldi@shh.mpg.de).

The author has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison.

If you use malign, please cite it as:

Tresoldi, Tiago (2020). MALIGN, a library for multiple asymmetric alignments on different alphabets. Version 0.2. Jena.

In BibTeX:

@misc{Tresoldi2020malign,
  author = {Tresoldi, Tiago},
  title = {MALIGN, a library for multiple asymmetric alignments on different alphabets. Version 0.2},
  howpublished = {\url{https://github.com/tresoldi/malign}},
  address = {Jena},
  publisher = {Max Planck Institute for the Science of Human History}
  year = {2020},
}

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

0.3.2

Feb 14, 2021

0.3.1

Feb 14, 2021

0.3

Feb 14, 2021

This version

0.2

Aug 7, 2020

0.1.1

Feb 25, 2020

0.1

Feb 25, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

malign-0.2.tar.gz (4.8 MB view details)

Uploaded Aug 7, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

malign-0.2-py3-none-any.whl (4.8 MB view details)

Uploaded Aug 7, 2020 Python 3

File details

Details for the file malign-0.2.tar.gz.

File metadata

Download URL: malign-0.2.tar.gz
Upload date: Aug 7, 2020
Size: 4.8 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for malign-0.2.tar.gz
Algorithm	Hash digest
SHA256	`8a144ee758a6f9598209d086feb0108fe479ac8c0496b3cda0cbb4202057b80b`
MD5	`da63cf0666761f291375f7a2b06e456f`
BLAKE2b-256	`a68934f15c445994674330db1ae07989b0d9ce08aefc558cb7032d7db670ae28`

See more details on using hashes here.

File details

Details for the file malign-0.2-py3-none-any.whl.

File metadata

Download URL: malign-0.2-py3-none-any.whl
Upload date: Aug 7, 2020
Size: 4.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for malign-0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5f0057d73771dcf728e958c0c204569f70c63cbdd99fc413b0a782ed4ba1492`
MD5	`6939bd1755ba94ba7c395a964bbd9e04`
BLAKE2b-256	`adac2f9651f67cc295136822cc44a6c588c848d4cd02d8d9fcd567729d971691`

See more details on using hashes here.

malign 0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation and usage

Changelog

Roadmap

Community guidelines

Author and citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes