Skip to main content

Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.

Project description

🐍🧮 PyFAMSA Stars

Cython bindings and Python interface to FAMSA, an algorithm for ultra-scale multiple sequence alignments.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror Issues Docs Changelog Downloads

⚠️ This package is based on FAMSA 2.

🗺️ Overview

FAMSA is a method published in 2016 by Deorowicz et al.[1] for large-scale multiple sequence alignments. It uses state-of-the-art time and memory optimizations as well as a fast guide tree heuristic to reach very high performance and accuracy.

PyFAMSA is a Python module that provides bindings to FAMSA using Cython. It implements a user-friendly, Pythonic interface to align protein sequences using different parameters and access results directly. It interacts with the FAMSA library interface, which has the following advantages:

  • single dependency: pyfamsa is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the FAMSA binary being present on the end-user machine.
  • no intermediate files: Everything happens in memory, in a Python object you control, so you don't have to invoke the FAMSA CLI using a sub-process and temporary files.
  • friendly interface: The different guide tree build methods and heuristics can be selected from the Python code with a simple keyword argument when configuring a new Aligner.

🔧 Installing

PyFAMSA can be installed directly from PyPI, which hosts some pre-built wheels for the x86-64 architecture (Linux/OSX) and the Aarch64 architecture (Linux only), as well as the code required to compile from source with Cython:

$ pip install pyfamsa

Otherwise, have a look at the Installation page of the online documentation

💡 Example

Let's create some sequences in memory, align them using the UPGMA method, (without any heuristic), and simply print the alignment on screen:

from pyfamsa import Aligner, Sequence

sequences = [
    Sequence(b"Sp8",  b"GLGKVIVYGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII"),
    Sequence(b"Sp10", b"DPAVLFVIMLGTITKFSSEWFFAWLGLEINMMVII"),
    Sequence(b"Sp26", b"AAAAAAAAALLTYLGLFLGTDYENFAAAAANAWLGLEINMMAQI"),
    Sequence(b"Sp6",  b"ASGAILTLGIYLFTLCAVISVSWYLAWLGLEINMMAII"),
    Sequence(b"Sp17", b"FAYTAPDLLLIGFLLKTVATFGDTWFQLWQGLDLNKMPVF"),
    Sequence(b"Sp33", b"PTILNIAGLHMETDINFSLAWFQAWGGLEINKQAIL"),
]

aligner = Aligner(guide_tree="upgma")
msa = aligner.align(sequences)

for sequence in msa:
      print(sequence.id.decode().ljust(10), sequence.sequence.decode())

This should output the following:

Sp10       --------DPAVLFVIMLGTIT-KFS--SEWFFAWLGLEINMMVII
Sp17       ---FAYTAPDLLLIGFLLKTVA-TFG--DTWFQLWQGLDLNKMPVF
Sp26       AAAAAAAAALLTYLGLFLGTDYENFA--AAAANAWLGLEINMMAQI
Sp33       -------PTILNIAGLHMETDI-NFS--LAWFQAWGGLEINKQAIL
Sp6        ------ASGAILTLGIYLFTLCAVIS--VSWYLAWLGLEINMMAII
Sp8        ------GLGKVIVYGIVLGTKSDQFSNWVVWLFPWNGLQIHMMGII

🧶 Thread-safety

Aligner objects are thread-safe, and the align method is re-entrant. You could batch process several alignments in parallel using a ThreadPool with a single aligner object:

import glob
import multiprocessing.pool
import Bio.SeqIO
from pyfamsa import Aligner, Sequence

families = [
    [ Sequence(r.id.encode(), r.seq.encode()) for r in Bio.SeqIO.parse(file, "fasta") ]
    for file in glob.glob("pyfamsa/tests/data/*.faa")
]

aligner = Aligner()
with multiprocessing.pool.ThreadPool() as pool:
    alignments = pool.map(aligner.align, families)

🔎 See Also

Done with your protein alignment? You may be interested in trimming it: in that case, you could use the pytrimal Python package, which wraps trimAl 2.0. Or perhaps you want to build a HMM from the alignment? Then maybe have a look at pyhmmer, a Python package which wraps HMMER.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

📋 Changelog

This project adheres to Semantic Versioning and provides a changelog in the Keep a Changelog format.

⚖️ License

This library is provided under the GNU General Public License v3.0. FAMSA is developed by the REFRESH Bioinformatics Group and is distributed under the terms of the GPLv3 as well. See vendor/FAMSA/LICENSE for more information. In addition, FAMSA vendors several libraries for compatibility, all of which are redistributed with PyFAMSA under their own terms: atomic_wait (MIT License), mimalloc (MIT License), libdeflate (MIT License), Boost (Boost Software License).

This project is in no way not affiliated, sponsored, or otherwise endorsed by the FAMSA authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

📚 References

  • [1] Deorowicz, Sebastian, Debudaj-Grabysz, Agnieszka & Gudyś, Adam. ‘FAMSA: Fast and accurate multiple sequence alignment of huge protein families’. Sci Rep 6, 33964 (2016). doi:10.1038/srep33964

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfamsa-0.3.2.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distributions

pyfamsa-0.3.2-pp310-pypy310_pp73-win_amd64.whl (1.4 MB view hashes)

Uploaded PyPy Windows x86-64

pyfamsa-0.3.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-pp310-pypy310_pp73-macosx_10_9_x86_64.whl (1.4 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfamsa-0.3.2-pp39-pypy39_pp73-win_amd64.whl (1.4 MB view hashes)

Uploaded PyPy Windows x86-64

pyfamsa-0.3.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (1.4 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfamsa-0.3.2-pp38-pypy38_pp73-win_amd64.whl (1.4 MB view hashes)

Uploaded PyPy Windows x86-64

pyfamsa-0.3.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (1.4 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfamsa-0.3.2-pp37-pypy37_pp73-win_amd64.whl (1.4 MB view hashes)

Uploaded PyPy Windows x86-64

pyfamsa-0.3.2-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-pp37-pypy37_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (1.4 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfamsa-0.3.2-cp312-cp312-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

pyfamsa-0.3.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-cp312-cp312-macosx_11_0_arm64.whl (1.5 MB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

pyfamsa-0.3.2-cp312-cp312-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

pyfamsa-0.3.2-cp311-cp311-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

pyfamsa-0.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-cp311-cp311-macosx_11_0_arm64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pyfamsa-0.3.2-cp311-cp311-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

pyfamsa-0.3.2-cp310-cp310-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

pyfamsa-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-cp310-cp310-macosx_11_0_arm64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pyfamsa-0.3.2-cp310-cp310-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

pyfamsa-0.3.2-cp39-cp39-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

pyfamsa-0.3.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-cp39-cp39-macosx_11_0_arm64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pyfamsa-0.3.2-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

pyfamsa-0.3.2-cp38-cp38-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

pyfamsa-0.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-cp38-cp38-macosx_11_0_arm64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

pyfamsa-0.3.2-cp38-cp38-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

pyfamsa-0.3.2-cp37-cp37m-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.7m Windows x86-64

pyfamsa-0.3.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

pyfamsa-0.3.2-cp37-cp37m-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

pyfamsa-0.3.2-cp36-cp36m-win_amd64.whl (1.4 MB view hashes)

Uploaded CPython 3.6m Windows x86-64

pyfamsa-0.3.2-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64

pyfamsa-0.3.2-cp36-cp36m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.8 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page