Skip to main content

Alignment generator using mappy and Python

Project description

FlowAlign

FlowAlign is a package to simplify realignment to a reference for viral sequences, using only Python packages. (Under the hood it is using minimap2, so it's fast.)

Installation

pip install flowalign

Usage (command-line)

flowalign sequences.fasta --reference wuhCor1.fa --output aligned.fa

If you omit the output parameter, output will go to STDOUT.

Usage (Python):

First we download the reference, and some unaligned sequences to align to it:

wget https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/bigZips/wuhCor1.fa.gz && gunzip wuhCor1.fa.gz
wget https://data.nextstrain.org/files/ncov/open/global/sequences.fasta.xz &&  xz --decompress sequences.fasta.xz

Then we write a simple Python script, creating an iterator called aligned that will yield reference-aligned versions of these unaligned sequences:

import flowalign
aligned = flowalign.yield_aligned(input="sequences.fasta", reference= "wuhCor1.fa")
for name, aligned_sequence in aligned:
    print(">"+name)
    print(aligned_sequence)

yield_aligned can also take a stream, e.g.:

aligned = flowalign.yield_aligned(input=open("sequences.fasta","rt), reference= "wuhCor1.fa")

Under the hood, mappy is in some sense calling minimap2 with --secondary=no --sam-hit-only --score-N=0 -x asm5.

Note that the multiprocessing implementation is fairly hacky which may cause issues. It is mostly expected that yield_aligned will be only called once at any one time.

Acknowledgements

This package is developed by me, but it owes almost everything to sam_2_fasta by Ben Jackson at the University of Edinburgh -- major functions are taken from that codebase. (Ben has also ported this code to gofasta). Sam2fasta is typically run on a SAM file from minimap2. flowalign incorporates the alignment process using mappy, the Python-bindings for minimap2. The idea is that one doesn't need any dependencies except Python packages (mappy supplies its own minimap2) to get aligned sequences.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowalign-0.10.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

flowalign-0.10-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file flowalign-0.10.tar.gz.

File metadata

  • Download URL: flowalign-0.10.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for flowalign-0.10.tar.gz
Algorithm Hash digest
SHA256 84868cac6cd270b93485110c5ab4bcfaed04051d7576848a4f635dd532154e32
MD5 9ebbe5f01aa52931725fc9f78185bf72
BLAKE2b-256 544f608fced40582a37d86061e69c0b0a47fe31cbdc4cffe93d9e9e7e1d4314b

See more details on using hashes here.

File details

Details for the file flowalign-0.10-py3-none-any.whl.

File metadata

  • Download URL: flowalign-0.10-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for flowalign-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 0dc7d71dd9e584913e0f564d9e0ae56a13e5322f2872086fd468788e23ff1553
MD5 6dc02e93ba35f5037da8a98fcd5d53f9
BLAKE2b-256 5d5615b0506d4ea8ab58b9cdf7e66832b03af23a898e63bb3d3a6531c5841d03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page