Skip to main content

Lineage prediction from SARS-CoV-2 sequences

Project description

FlowAlign

FlowAlign is an experimental package to simplify realignment to a reference for viral sequences, using only Python packages. This package is developed by me, but it owes almost everything to sam_2_fasta by Ben Jackson at the University of Edinburgh -- major functions are taken from that codebase. (Ben has also ported this code to gofasta). Sam2fasta is typically run on a SAM file from minimap2. flowalign incorporates the alignment process using mappy, the Python-bindings for minimap2. The idea is that one doesn't need any dependencies except Python packages (mappy supplies its own minimap2) to get aligned sequences.

Installation

pip install flowalign

Usage (Python):

First we download the reference, and some unaligned sequences to align to it:

wget https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/bigZips/wuhCor1.fa.gz && gunzip wuhCor1.fa.gz
wget https://data.nextstrain.org/files/ncov/open/global/sequences.fasta.xz &&  xz --decompress sequences.fasta.xz

Then we write a simple Python script, creating an iterator called aligned that will yield reference-aligned versions of these unaligned sequences:

import flowalign
aligned = flowalign.yield_aligned(input="sequences.fasta", reference= "wuhCor1.fa")
for name, aligned_sequence in aligned:
    print(">"+name)
    print(aligned_sequence)

yield_aligned can also take a stream, e.g.:

aligned = flowalign.yield_aligned(input=open("sequences.fasta","rt), reference= "wuhCor1.fa")

Under the hood, mappy is in some sense calling minimap2 with --secondary=no --sam-hit-only --score-N=0 -x asm20.

Note that the multiprocessing implementation is fairly hacky which may cause issues. It is mostly expected that yield_aligned will be only called once at any one time.

Usage (command-line)

flowalign sequences.fasta --reference wuhCor1.fa --output aligned.fa

If you omit the output, output will go to STDOUT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowalign-0.7.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

flowalign-0.7-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file flowalign-0.7.tar.gz.

File metadata

  • Download URL: flowalign-0.7.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flowalign-0.7.tar.gz
Algorithm Hash digest
SHA256 91c657443e73acd245af75054b4ca7485f11d3515ef1135a3dcbea84c4d622d5
MD5 e6a2bc50e241fb8f3a0baf74c560da6d
BLAKE2b-256 70f8f4ae07bd127133dae0d8919f67330df83b7e2fb715161034286b099fe8be

See more details on using hashes here.

File details

Details for the file flowalign-0.7-py3-none-any.whl.

File metadata

  • Download URL: flowalign-0.7-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for flowalign-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 d0abfccabd52e720bf7f17b7fdb6f00b90aa4ab656b2ecb1caf6f1f1e2bb887b
MD5 49f7a2af49a1efb45f233d605d218c3e
BLAKE2b-256 460103281bd162343ef54c80115b2688f093002b08a2c7226c3d4bb8673c1421

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page