Library for multiple asymmetric alignments on different alphabets
Project description
MAlign
MALIGN is a library for performing multiple alignments on sequences of different domains, allowing the usage of asymmetric scoring matrices. Multiple alignments are actual multiple alignments, scoring according to the overall probability of each alignment site, and not a succession of pairwise alignments gradually combined.
While intended for linguistic usage mostly, it can be used for aligning any type of sequential representation as long as the elements of each domain are hashable. It is particularly suitable as a general-purpose tool for cases where there are no prior hypotheses on the scoring matrices, which can be inferred or imputed (including from incomplete data), or optimized from observable examples to find local and global minima that can be used to explain the relationships between the sequences.
Installation and usage
The library can be installed as any standard Python library with
pip
, preferably within a virtual environment:
$ pip install malign
For most purposes, it is enough to pass the sequences to be aligned and
specify one of the available methods (currently anw
, the default,
and yenksp
) to the .multi_align()
function, along with the maximum
number of alignments to be returned (k
):
>> import malign
>> alms = malign.multi_align(["ATTCGGAT", "TACGGATTT"], k=2)
>> print(malign.tabulate_alms(alms))
| Idx | Seq | Score | #0 | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 |
|-------|-------|---------|------|------|------|------|------|------|------|------|------|------|
| 0 | A | -0.29 | A | T | T | C | G | G | A | - | T | - |
| 0 | B | -0.29 | - | T | A | C | G | G | A | T | T | T |
| | | | | | | | | | | | | |
| 1 | A | -0.29 | A | T | T | C | G | G | A | - | - | T |
| 1 | B | -0.29 | - | T | A | C | G | G | A | T | T | T |
Scoring matrices can be either computed with the auxiliary methods, including various optimizations, or read from JSON files:
>> ita_rus = malign.ScoringMatrix()
>> ita_rus.load("docs/ita_rus.matrix")
>> alms = malign.multi_align(["Giacomo", "Яков"], k=4, method="anw", matrix=ita_rus)
>> print(malign.tabulate_alms(alms))
| Idx | Seq | Score | #0 | #1 | #2 | #3 | #4 | #5 | #6 | #7 |
|-------|-------|---------|------|------|------|------|------|------|------|------|
| 0 | A | 2.86 | G | i | a | c | o | m | o | |
| 0 | B | 2.86 | - | Я | - | к | о | в | - | |
| | | | | | | | | | | |
| 1 | A | 2.29 | G | i | a | c | o | m | o | |
| 1 | B | 2.29 | - | Я | - | к | о | - | в | |
| | | | | | | | | | | |
| 2 | A | 2.12 | G | i | a | c | o | m | o | - |
| 2 | B | 2.12 | - | Я | - | к | о | - | - | в |
| | | | | | | | | | | |
| 3 | A | 2.12 | G | i | a | c | o | m | o | - |
| 3 | B | 2.12 | - | Я | - | к | - | - | о | в |
More complex examples, including for matrix imputation and optimization, can be found in the documentation.
Changelog
Version 0.1:
- First release for an internal announcement, testing, and community outreach
Version 0.2:
- Major revision with asymmetric Needleman-Wunsch and Yen's
k
-shortest path implementation - Added scoring matrix object
- Sort alignments in consistent and reproducible ways, even when the alignment score is the same
Version 0.3
- Code improvements, including type annotation, and some refactoring
- Allowing usage with any hashable Python object (not only strings)
- Add methods for matrix imputation
- Update of documentation
- General preparations for public announcement
TODO
- Complete documentation and setup
readthedocs
- Consider implementation of UPGMA and NJ multiple alignment
- Add function/method to visualize the graphs used for the
yenksp
methods - Implement blocks and local search in
anw
andyenksp
, with different starting/ending positions - Implement memoization where possible
- Consider expanding dumb_malign by adding random gaps (
pad_align
), as an additional baseline method - Allow
anw
to work within a threshold percentage of the best score - Implement a method combining the results of the different algorithms
- Add methods and demonstration for matrix optimization
- Move to GitHub Actions
Community guidelines
While the author can be contacted directly for support, it is recommended that third parties use GitHub standard features, such as issues and pull requests, to contribute, report problems, or seek support.
Contributing guidelines, including a code of conduct, can be found in
the CONTRIBUTING.md
file.
Author and citation
The library is developed by Tiago Tresoldi (tiago.tresoldi@lingfil.uu.se).
The author has received funding from the Riksbankens Jubileumsfond (grant agreement ID: MXM19-1087:1, Cultural Evolution of Texts).
During the first stages of development, the author received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. ERC Grant #715618, Computer-Assisted Language Comparison).
If you use malign
, please cite it as:
Tresoldi, Tiago (2021). MALIGN, a library for multiple asymmetric alignments on different domains. Version 0.3. Uppsala: Uppsala Universitet.
In BibTeX:
@misc{Tresoldi2021malign,
author = {Tresoldi, Tiago},
title = {MALIGN, a library for multiple asymmetric alignments on different domains. Version 0.3},
howpublished = {\url{https://github.com/tresoldi/malign}},
address = {Uppsala},
publisher = {Uppsala Universitet}
year = {2021},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file malign-0.3.2.tar.gz
.
File metadata
- Download URL: malign-0.3.2.tar.gz
- Upload date:
- Size: 4.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b785e6d6feebc9851fa05d2162e195c4bea424db51a296d825daac33174277c3 |
|
MD5 | 4f4ae60bbf908c620e57c45755b35669 |
|
BLAKE2b-256 | a2cd48f46b2a98a19203c1c35889660e308fae414c92cb3472f096c5201511d2 |
File details
Details for the file malign-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: malign-0.3.2-py3-none-any.whl
- Upload date:
- Size: 4.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.56.2 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a6d3190d4a60fd46de6274864226ae85f1ab3e9df22cf376fe40da2ed9a6765 |
|
MD5 | f5f8369ddfa53f8820cf5d63e2ac0c9c |
|
BLAKE2b-256 | 8c89d1d32ec08c6f4bc12ab0e5d68e387a96cfab41b5c3a0a8f5430ee9328a2a |