Skip to main content

Cython bindings and Python interface to FastANI, a method for fast whole-genome similarity estimation.

Project description

🐍⏩🧬 PyFastANI Stars

Cython bindings and Python interface to FastANI, a method for fast whole-genome similarity estimation.

Actions Coverage License PyPI Bioconda AUR Wheel Python Versions Python Implementations Source Mirror Issues Docs Changelog Downloads DOI

🗺️ Overview

FastANI is a method published in 2018 by Jain et al. for high-throughput computation of whole-genome Average Nucleotide Identity (ANI). It uses MashMap to compute orthologous mappings without the need for expensive alignments.

pyfastani is a Python module, implemented using the Cython language, that provides bindings to FastANI. It directly interacts with the FastANI internals, which has the following advantages over CLI wrappers:

  • simpler compilation: FastANI requires several additional libraries, which make compilation of the original binary non-trivial. In PyFastANI, libraries that were needed for threading or I/O are provided as stubs, and Boost::math headers are vendored so you can build the package without hassle. Or even better, just install from one of the provided wheels!
  • single dependency: If your software or your analysis pipeline is distributed as a Python package, you can add pyfastani as a dependency to your project, and stop worrying about the FastANI binary being present on the end-user machine.
  • sans I/O: Everything happens in memory, in Python objects you control, making it easier to pass your sequences to FastANI without needing to write them to a temporary file.

This library is still a work-in-progress, and in an experimental stage, but it should already pack enough features to be used in a standard pipeline.

🔧 Installing

PyFastANI can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well as the code required to compile from source with Cython:

$ pip install pyfastani

In the event you have to compile the package from source, all the required libraries are vendored in the source distribution, so you'll only need a C/C++ compiler.

Otherwise, PyFastANI is also available as a Bioconda package:

$ conda install -c bioconda pyfastani

💡 Example

The following snippets show how to compute the ANI between two genomes, with the reference being a draft genome. For one-to-many or many-to-many searches, simply add additional references with m.add_draft before indexing. Note that any name can be given to the reference sequences, this will just affect the name attribute of the hits returned for a query.

🔬 Biopython

Biopython does not let us access to the sequence directly, so we need to convert it to bytes first with the bytes builtin function. For older versions of Biopython (earlier than 1.79), use record.seq.encode() instead of bytes(record.seq).

import pyfastani
import Bio.SeqIO

sketch = pyfastani.Sketch()

# add a single draft genome to the mapper, and index it
ref = list(Bio.SeqIO.parse("vendor/FastANI/data/Shigella_flexneri_2a_01.fna", "fasta"))
sketch.add_draft("S. flexneri", (bytes(record.seq) for record in ref))

# index the sketch and get a mapper
mapper = sketch.index()

# read the query and query the mapper
query = Bio.SeqIO.read("vendor/FastANI/data/Escherichia_coli_str_K12_MG1655.fna", "fasta")
hits = mapper.query_sequence(bytes(query.seq))

for hit in hits:
    print("E. coli K12 MG1655", hit.name, hit.identity, hit.matches, hit.fragments)

🧪 Scikit-bio

Scikit-bio lets us access to the sequence directly as a numpy array, but shows the values as byte strings by default. To make them readable as char (for compatibility with the C code), they must be cast with seq.values.view('B').

import pyfastani
import skbio.io

sketch = pyfastani.Sketch()

ref = list(skbio.io.read("vendor/FastANI/data/Shigella_flexneri_2a_01.fna", "fasta"))
sketch.add_draft("Shigella_flexneri_2a_01", (seq.values.view('B') for seq in ref))

mapper = sketch.index()

# read the query and query the mapper
query = next(skbio.io.read("vendor/FastANI/data/Escherichia_coli_str_K12_MG1655.fna", "fasta"))
hits = mapper.query_genome(query.values.view('B'))

for hit in hits:
    print("E. coli K12 MG1655", hit.name, hit.identity, hit.matches, hit.fragments)

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the MIT License.

The FastANI code was written by Chirag Jain and is distributed under the terms of the Apache License 2.0, unless otherwise specified in vendored sources. See vendor/FastANI/LICENSE for more information. The cpu_features code was written by Guillaume Chatelet and is distributed under the terms of the Apache License 2.0. See vendor/cpu_features/LICENSE for more information. The Boost::math headers were written by Boost Libraries contributors and is distributed under the terms of the Boost Software License. See vendor/boost-math/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original FastANI authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfastani-0.3.1.tar.gz (4.6 MB view hashes)

Uploaded Source

Built Distributions

pyfastani-0.3.1-pp39-pypy39_pp73-manylinux_2_24_x86_64.whl (3.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (3.2 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfastani-0.3.1-pp38-pypy38_pp73-manylinux_2_24_x86_64.whl (3.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (3.2 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfastani-0.3.1-pp37-pypy37_pp73-manylinux_2_24_x86_64.whl (3.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (3.2 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

pyfastani-0.3.1-cp310-cp310-manylinux_2_24_x86_64.whl (4.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-cp310-cp310-macosx_10_15_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.10 macOS 10.15+ x86-64

pyfastani-0.3.1-cp39-cp39-manylinux_2_24_x86_64.whl (4.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-cp39-cp39-macosx_10_15_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

pyfastani-0.3.1-cp38-cp38-manylinux_2_24_x86_64.whl (4.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-cp38-cp38-macosx_10_15_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

pyfastani-0.3.1-cp37-cp37m-manylinux_2_24_x86_64.whl (4.4 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-cp37-cp37m-macosx_10_15_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.7m macOS 10.15+ x86-64

pyfastani-0.3.1-cp36-cp36m-manylinux_2_24_x86_64.whl (4.4 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.24+ x86-64

pyfastani-0.3.1-cp36-cp36m-macosx_10_14_x86_64.whl (3.2 MB view hashes)

Uploaded CPython 3.6m macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page