Skip to main content

Remove outlier sequences from multiple sequence alignment

Project description

seqSieve
===========

Installation:

pip install seqSieve

This should also install numpy and matplotlib automatically if necessary.
If you have trouble installing dependencies via pip, try installing
them with your distribution's package manager.

On debian do:

apt-get install python-matplotlib python-numpy

It is also possible to run seqSieve without installation

python seqSieve/seqSieve



**seqSieve** will try to remove sequences that cause misalignments from a multiple sequence alignment(MSA).
It reads a given MSA in multi-fasta format and removes sequences with the highest penalty scores,
then builds the next MSA without those sequences. This process is repeated until a user-specified
cut-off is reached or less than three sequences are left to be aligned.

In the default mode "Sites", sequences are penalized for both gaps and insertions by an amount proportional to the percentage of ungapped and gapped sequences, respectively.
The modes "Gaps", "uGaps","Insertions", "uInsertions","uInsertionsGaps" always assign a penalty of 1 for the named variation. "u" stands for unique, i.e. uGaps only penalizes unique gaps.
With mode "custom" the user sets the penalties for each variation.

Usage:

######################################
# seqSieve
######################################
usage:
seqSieve -f multifasta alignment
options:
-f, --fasta=FILE multifasta alignment (eg. "align.fas")
OR
-F, --fasta_dir=DIR directory with multifasta files (needs -s SUFFIX)
-s, --suffix=SUFFIX will try to work with files that end with SUFFIX (eg ".fas")

-a, --msa_tool=STR supported: "mafft", prank, prankf (= prank +F) [default:"mafft"]
-i, --max_iterations=NUM force stop after NUM iterations
-n, --num_threads=NUM max number of threads to be executed in parallel [default: 1]
-m, --mode=MODE set strategy to remove outlier sequences [default: "Sites"]
available modes (not case sensitive):
"Sites", "Gaps", "uGaps","Insertions",
"uInsertions","uInsertionsGaps", "custom"
-q, --no-realign don't realign with each iteration (not recommended)
-l, --log write logfile
-h, --help prints this

only for mode "custom":
-g, --gap_penalty=NUM set gap penalty [default: 1.0]
-G, --unique_gap_penalty=NUM set unique gap penalty [default: 10.0]
-j, --insertion_penalty=NUM set insertion penalty [default:1.0]
-J, --unique_insertion_penalty=NUM set insertion penalty [default:1.0]
-M, --mismatch_penalty=NUM set mismatch penalty [default:1.0]
-r, --match_reward=NUM set match reward [default: -10.0]


Currently supported multiple sequence aligners:

- mafft (Katoh, Standley 2013 (Molecular Biology and Evolution 30:772-780)
MAFFT multiple sequence alignment software version 7: improvements in performance and usability. http://mafft.cbrc.jp/alignment/software/)
- prank (Loytynoja, Goldman 2005 (PNAS 102:10557-10562)
An algorithm for progressive multiple alignment of sequences with insertions. http://www.ebi.ac.uk/goldman-srv/prank/prank/

Requirements
============
* matplotlib
* numpy

External Programs
-----------------
* mafft and/or
* prank

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqSieve-0.9.3.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seqSieve-0.9.3-py2.7.egg (19.8 kB view details)

Uploaded Egg

File details

Details for the file seqSieve-0.9.3.tar.gz.

File metadata

  • Download URL: seqSieve-0.9.3.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for seqSieve-0.9.3.tar.gz
Algorithm Hash digest
SHA256 31758a4897368178a693e58627e556b609f991d071e9f6803292919056477375
MD5 8000ac2493282c2d17e117ccb1222341
BLAKE2b-256 2439ccd3ccef9b5879ddd10206beee55f6bdf68d288495d4023c110a2c28c031

See more details on using hashes here.

File details

Details for the file seqSieve-0.9.3-py2.7.egg.

File metadata

  • Download URL: seqSieve-0.9.3-py2.7.egg
  • Upload date:
  • Size: 19.8 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for seqSieve-0.9.3-py2.7.egg
Algorithm Hash digest
SHA256 12436834b58300afc4c82a1b977560d9af8f734e1d19a521912028dc96bfc88d
MD5 db322109513257d197ec79d7ab375369
BLAKE2b-256 726c0bf31f1c5542a64ee1f398c48eb4ce3adae423d777ab6d9bc254dfd59dac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page