tools for working with a3m files for structure prediction
Project description
a3mtools
DEPRECATED: This package had been renamed from a3mtools to a3mcat. Please use the new package instead: a3mcat
Tools for working with a3m files. Designed to help generate input for structure prediction tools like alphafold.
>meow
---------/\_/\--
--------( o.o )-
--------- >^< --
Main features:
- import a3m files from MMseqs2 search results into python objects
- Easily slice MSAs while preserving insertions
- Easily combine multiple MSAs into a single a3m file for complex prediction
- Save manipulated MSAs to new a3m files
- convert from fasta to a3m
- coming soon: convert from a3m to fasta
If you have any questions, find any issues/bugs, or have suggestions, please open an issue on the github page.
Installation
pip install a3mtools
or if you want an editable version:
git clone https://github.com/jacksonh1/a3mtools.git
cd a3mtools
pip install -e .
What these tools do:
The current implementation of the tools are designed to handle MSAs retrieved using the colabfold MMseqs2 search tool. There are some strange quirks about those files specifically that I tried to account for. Mainly query sequence names are denoted by 3 digit integers starting at 101. So if an MSA only has 1 query sequence it will be named 101. If there are 2 query sequences they will be named 101 and 102 etc. This comes into play when combining MSAs for predicting protein complexes. Additionally, when combining 2 MSAs, the query sequences are combined into a single sequence. This is required to predict the complex structure. The query sequences are also added back to the MSA as individual sequences. MSAs are combined in unpaired format.
Here's an example (with no homologous sequences):
MSA1 a3m file:
#9 1
>101
ABCDEFGHI
MSA2 a3m file:
#17 1
>101
JKLMNOPQRSTUVWXYZ
MSA1 + MSA2 a3m file:
#9,17 1,1
>101 102
ABCDEFGHIJKLMNOPQRSTUVWXYZ
>101
ABCDEFGHI-----------------
>102
---------JKLMNOPQRSTUVWXYZ
The MSA1 + MSA2 a3m file could be used as input to many structure prediction tools to predict the structure of the complex formed by the 2 query sequences.
I don't know if maintaining this specific 101, 102, 103 ... naming scheme is strictly necessary, but we'll stick with it for now.
The tools also allow for slicing the MSA and saving the MSA to a file.
When slicing or combining MSAs, insertions are maintained (see output of examples).
Usage
There will eventually be a more complete guide. But for now, you can install the package and run the following code to see some of the basic functionality.
See the demo notebook (demo.ipynb) for additional examples and usage.
import a3mtools
import a3mtools.examples as examples # examples that come installed with a3mtools
import an a3m file
msa = a3mtools.MSAa3m.from_a3m_file(examples.a3m_file1)
print(msa)
#9 1
>101
ABCDEFGHI
>ortho1
xxABCDxxEFGZZ
>ortho2
A--DxxE-GHIxxxx
>ortho3
----xxEFGH-
The input to the a3mtools.MSAa3m.from_a3m_file function is the path to an a3m file.
slicing the alignment
print(msa[2:5])
#3 1
>101
CDE
>ortho1
CDxxE
>ortho2
-DxxE
>ortho3
--xxE
concatenating alignments
msa2 = a3mtools.MSAa3m.from_a3m_file(examples.a3m_file2)
print(msa2)
#17 1
>101
JKLMNOPQRSTUVW
>ortho1
JKLMNOPQRSTUVWxxxxxx
>ortho2
------PQRSTUVW
print(msa + msa2)
#9,17 1,1
>101 102
ABCDEFGHIJKLMNOPQRSTUVW
>101
ABCDEFGHI--------------
>ortho1
xxABCDxxEFGZZ--------------
>ortho2
A--DxxE-GHIxxxx--------------
>ortho3
----xxEFGH---------------
>102
---------JKLMNOPQRSTUVW
>ortho1
---------JKLMNOPQRSTUVWxxxxxx
>ortho2
---------------PQRSTUVW
print(msa + msa2[2:5] + msa[5:])
#9,3,4 1,1,1
>101 102 103
ABCDEFGHILMNFGHI
>101
ABCDEFGHI-------
>ortho1
xxABCDxxEFGZZ-------
>ortho2
A--DxxE-GHIxxxx-------
>ortho3
----xxEFGH--------
>102
---------LMN----
>ortho1
---------LMN----
>103
------------FGHI
>ortho1
------------FGZZ
>ortho2
-------------GHI
>ortho3
------------FGH-
saving the alignment to a file
complex_msa = msa + msa2
complex_msa.save("example_complex.a3m")
creating an empty MSAa3m object
empty_msa = a3mtools.MSAa3m.empty_MSA('ABCDEFG')
print(empty_msa)
#7 1
>101
ABCDEFG
>101
ABCDEFG
notice that there are 2 query sequences in the empty MSAa3m object. This is to mimic the colabfold behavior. I do not know if this is actually necessary or not.
importing a fasta file
fasta_msa = a3mtools.MSAfasta.from_fasta_file(examples.fasta_file1)
print(fasta_msa)
>q
LVT---FLAGCQ---
>a
LVTTTTFL--CQQQQ
>b
LVTTTTFLAGCQQQQ
>c
LVT---FLAGCQQQQ
convert MSAfasta to MSAa3m
To convert a fasta to an a3m, you have to select a query sequence, since a3m files are all formatted relative to a query sequence.
y = fasta_msa.to_a3m(query_header="q")
print(y)
#9 1
>101
LVTFLAGCQ
>a
LVTtttFL--CQqqq
>b
LVTtttFLAGCQqqq
>c
LVTFLAGCQqqq
notice the difference in formatting when we choose a different query sequence:
y = fasta_msa.to_a3m(query_header="a")
print(y)
#13 1
>101
LVTTTTFLCQQQQ
>q
LVT---FLagCQ---
>b
LVTTTTFLagCQQQQ
>c
LVT---FLagCQQQQ
important notes
- slice numbering is relative to the query sequence
- query sequence is always the first sequence in the MSA, and is named 101 or some combination of concatenated querys if the MSA is the result of concatenation (e.g. 101\t102)
- slicing with a step size other than 1 is not supported yet and probably will not be supported
- MSAs are combined in unpaired format
- combining MSAs in paired format is not supported yet
future features:
- documentation on readthedocs
- examples and code
- convert between fasta and a3m
- convert between a3m and fasta
- allow for more generic naming of query sequence
- add better test functions
- an option for combining MSAs in paired format
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file a3mtools-0.1.0a6.tar.gz.
File metadata
- Download URL: a3mtools-0.1.0a6.tar.gz
- Upload date:
- Size: 12.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1f5a7541d28049776b030c55540a92813eac269f2025520b4010d7a57817a8d
|
|
| MD5 |
ef7a0aa1e347de55d65ffcc5c90c02b9
|
|
| BLAKE2b-256 |
7248e6a2c11d515ba96ed5176dc60b4fdb4cdbdb1327a40313c338c6719b387a
|
File details
Details for the file a3mtools-0.1.0a6-py3-none-any.whl.
File metadata
- Download URL: a3mtools-0.1.0a6-py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da0cdcb4cafc104b28ef8806fc57ee4d9274e7247b5fb899cc462f5c6caa61da
|
|
| MD5 |
d8f66c651a8c1c49fe935d266a112726
|
|
| BLAKE2b-256 |
e69ade8586c23819a4c5c1d4240f1d4428cc4da91be16d634b224f9b03dcaff1
|