Alignment generator using mappy and Python
Project description
FlowAlign
FlowAlign is an experimental package to simplify realignment to a reference for viral sequences, using only Python packages. This package is developed by me, but it owes almost everything to sam_2_fasta by Ben Jackson at the University of Edinburgh -- major functions are taken from that codebase. (Ben has also ported this code to gofasta). Sam2fasta is typically run on a SAM file from minimap2. flowalign incorporates the alignment process using mappy, the Python-bindings for minimap2. The idea is that one doesn't need any dependencies except Python packages (mappy supplies its own minimap2) to get aligned sequences.
Installation
pip install flowalign
Usage (Python):
First we download the reference, and some unaligned sequences to align to it:
wget https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/bigZips/wuhCor1.fa.gz && gunzip wuhCor1.fa.gz
wget https://data.nextstrain.org/files/ncov/open/global/sequences.fasta.xz && xz --decompress sequences.fasta.xz
Then we write a simple Python script, creating an iterator called aligned
that will yield reference-aligned versions of these unaligned sequences:
import flowalign
aligned = flowalign.yield_aligned(input="sequences.fasta", reference= "wuhCor1.fa")
for name, aligned_sequence in aligned:
print(">"+name)
print(aligned_sequence)
yield_aligned can also take a stream, e.g.:
aligned = flowalign.yield_aligned(input=open("sequences.fasta","rt), reference= "wuhCor1.fa")
Under the hood, mappy is in some sense calling minimap2 with --secondary=no --sam-hit-only --score-N=0 -x asm20
.
Note that the multiprocessing implementation is fairly hacky which may cause issues. It is mostly expected that yield_aligned
will be only called once at any one time.
Usage (command-line)
flowalign sequences.fasta --reference wuhCor1.fa --output aligned.fa
If you omit the output, output will go to STDOUT.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flowalign-0.9.tar.gz
.
File metadata
- Download URL: flowalign-0.9.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc015679cb8431a010bdd1e99579730849789db0ecee5f6b12a34b40bc3f0cef |
|
MD5 | 454b9af411370743a2aefbbf6617b444 |
|
BLAKE2b-256 | 6e0e5018c0ad18723995f9f258416f684bb14bb4b20a6d52a91fa86686df2c36 |
File details
Details for the file flowalign-0.9-py3-none-any.whl
.
File metadata
- Download URL: flowalign-0.9-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e42c014cce3f2efaff55d2b4491651932f86bb9787320b83a86aceea7d60ee65 |
|
MD5 | 0d5cd98497d8afedb5ee5434162852e8 |
|
BLAKE2b-256 | febb66859338b910f0ae84b493f4dbc310526c33e6fed60412538a1ad4c8e2de |