Skip to main content

Building and managing MSA prior to stucture inference

Project description

Multiple Sequence Align/Alpha Fold

Streamlining the MSA building stages

Gives you control to the database search and the bundling of msa files prior to structure inference.

Installation

External dependencies

MSAF uses the following tools:

  • mmseqs2 for database search
  • mafft for multiple sequence alignment You will need those two sotware installed

Python package

Just, pip install msaf2

Global setup

MSAF often requires a configuration file (as -c flag). This configuration file is in yaml format and has the following shape

databases : 
  - /path/to/databases/mmseqs
executables:
  mafft: /usr/local/bin/mafft
  mmseqs: /opt/homebrew/bin/mmseqs
settings:
  cache : /path/to/msaf/cache
cocktails:
    test:
        ingredients:
        - target: swissprot
            label: pif.sto
        - target: uniprot
            label: paf.a3m

Where,

  • databases is a list of folders, where MSAF recursively looks for mmseqs database
  • executables are key, value of paths to executable external dependencies
  • cache points to a folder used to store MSAF mess, it MUST exist
  • cocktails is dictionary of recipes

A configuration template file can be generated by the following command

python -m msaf2 --generate

Which you can then edit according to your settings.

MSAF recipes

Recipes are declared in the configuration file. A recipe is caracterized by a name (eg:test) and ingredients. ingredients define database search and save schema as list of target and label. The target key defines the database to search and label defines the resulting msa file (and format). Recipes may also feature an optional PDQT parameter, which if set to TRUE will wrap all a3m files in an aligned.pdqt file

In the above exemple, the test recipe will trigger a search in swissprot and uniprot for all supplied queries.

  • The result of swissprot search will be saved under stockholm format in a file named pif.sto
  • The result of the uniprot search will be saved under a3m format in a file named paf.a3m

Usage

List available database

At startup, MSAF will recurively search inside all databases item found in configuration file for mmseqs database files (<database_name>_h, <database_name>_.index, <database_name>.lookup, <database_name>.index).

The registred <database_name> can be displayed with

msaf2 config.yaml --list

run a search

msaf2 -c config.yaml --query <abs_path_query1.fasta> <abs_path_query2.fasta> --bp test

With --bp refering to one recipe defined in the config file and --query to absolute path(s) of query sequence file(s) (fasta format).

Multimer search

Results will be saved in the --output folder (msas, by default) with subfolders using sequential one letter chain identifier along the sequence of query files. If the same file is provided more than once as a query, only one folder will be created. Hence, results of an homodimer search will be stored under a single A/ subfolder.

wrap a preexisting folder of msa

if a preexisitng folder is passed with the --pdqt flag, the a3m msa files present in this folder will be archive in a aligned.pdqt file.

msaf2 --pdqt <results_a3m_folder>

Miscellaneous

How to format fasta database file, from MMSEQS2 documentation:

Searching
Before searching, you need to convert your FASTA file containing query sequences and target sequences into a sequence DB. You can use the query database examples/QUERY.fasta and target
database examples/DB.fasta to test the search workflow:
mmseqs createdb examples/QUERY.fasta queryDB
mmseqs createdb examples/DB.fasta targetDB
These calls should generate five files each, e.g. queryDB, queryDB_h and its corresponding index file
queryDB.index, queryDB_h.index and queryDB.lookup from the FASTA QUERY.fasta input
sequences.
The queryDB and queryDB.index files contain the amino acid sequences, while the queryDB_h and
queryDB_h.index file contain the FASTA headers. The queryDB.lookup file contains a list of tab
separated fields that map from the internal identifier to the FASTA identifiers.
For the next step, an index file of the targetDB is computed for a fast read-in. It is recommended
to compute the index if the targetDB is reused for several searches. If only few searches against this
database will be done, this step should be skipped.
mmseqs createindex targetDB tmp
This call will create a targetDB.idx file. It is just possible to have one index per database.
Then generate a directory for temporary files. MMseqs2 can produce a high IO on the file system.
It is recommended to create this temporary folder on a local drive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

msaf2-0.5.0.tar.gz (60.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

msaf2-0.5.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file msaf2-0.5.0.tar.gz.

File metadata

  • Download URL: msaf2-0.5.0.tar.gz
  • Upload date:
  • Size: 60.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.17

File hashes

Hashes for msaf2-0.5.0.tar.gz
Algorithm Hash digest
SHA256 bccba16e2fb26853ba4731ca01478ef417fc6bca3823f2d56d2e097bb35e45da
MD5 1d7f8c4bb57ad5fcda760128a2bd9681
BLAKE2b-256 1eae9b41c5482006cd1c91ae4b28fb428fa8ee681f445d3980e7ba3213c753de

See more details on using hashes here.

File details

Details for the file msaf2-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: msaf2-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.17

File hashes

Hashes for msaf2-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 573b2517560df9927060ade241635f464f32f49b9ee53fc3769a1af96c6e577d
MD5 6a426016d43609d89e7f74b72c1a4f7c
BLAKE2b-256 f3454cb098b345ef0b98942d67a6baf02ae8a6aaf9286e3473b70bad49ba7b69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page